Viale Premuda 14, 20129 Milano - academy@digiacademy.it - 0250030724

Tech Data

Cerca un corso

Big Data

Codice corso: TDAI006
Durata corso: 2gg

Learn how big data is driving organizational change and essential analytical tools and techniques. Understand big data and how it will impact your business with the tools and systems used by big data scientists and engineers.

 

 Chapter 1. Defining Big Data

  • In-Class Discussion
  • Gartner's Definition of Big Data
  • More Definitions of Big Data
  • Transforming Data into Business Information
  • Challenges Posed by Big Data
  • Processing Big Data
  • Apache Hadoop
  • The Cloud and Big Data
  • The CAP Theorem
  • Summary

Chapter 2. Hadoop Overview

  • The Client – Server Processing Pattern
  • Apache Hadoop
  • Apache Hadoop Logo
  • Typical Hadoop Applications
  • Hadoop Clusters
  • Hadoop Distributions
  • Hadoop's Main Components
  • HDFS
  • HDFS Blocks
  • YARN
  • Hadoop-based Systems for Data Analysis
  • MapReduce
  • Similarity with SQL Aggregation Operations
  • Distributed Computing Economics
  • Discussion: Divide and Conquer
  • Apache Pig
  • Pig Latin
  • Running Pig
  • Pig Latin Script Example
  • What is Hive?
  • Hive's Value Proposition
  • Who uses Hive?
  • What Hive Does Not Have
  • HiveQL
  • Working with Hive Tables
  • What is HBase?
  • HBase vs RDBS
  • Interfacing with HBase
  • HBase Table Design Digest
  • A Cell's Value Versioning
  • Creating and Populating a Table in HBase Shell
  • Getting a Cell's Value
  • Counting Rows in an HBase Table
  • Summary

Chapter 3. Big Data Analytics in the Cloud

  • Data is King
  • Big Data Stores in the Cloud 
  • Example: AWS Simple Storage Service (S3) 
  • MapReduce (and Hadoop) in the Cloud 
  • Information and Data Security
  • Data-at-rest Security Examples
  • Example of Object Encryption in S3
  • One S3 Use Case: Backup and Archiving
  • Data Analytics Services in the Cloud
  • Analytics Services with AWS
  • AWS EMR: Software Configuration Screen
  • AWS EMR: Hardware Configuration Screen
  • Big Data Analytics Solutions from Google Cloud
  • Google Data Processing and Analytics Pipelines
  • Google BigQuery
  • Machine Learning
  • Microsoft Azure ML Studio
  • Machine Learning Pipeline
  • Summary

Chapter 4. Making Big Data Small Techniques

  • What is Data Science?
  • Data Science, Machine Learning, AI?
  • Making Big Data Small
  • Descriptive Statistics
  • Correlation
  • Reducing the Number of Data Attributes
  • Lasso Regularization
  • Sampling Examples
  • Data Compression
  • Summary

Chapter 5. Introduction to Apache Spark

  • What is Apache Spark
  • Where to Get Spark?
  • The Spark Platform
  • Spark Logo
  • Common Spark Use Cases
  • Running Spark on a Cluster
  • The Driver Process
  • Spark Shell
  • Interfaces with Data Storage Systems
  • Limitations of Hadoop's MapReduce
  • Spark vs MapReduce
  • The Resilient Distributed Dataset (RDD)
  • Spark Streaming (Micro-batching)
  • Spark SQL
  • Example of Spark SQL
  • Spark Machine Learning Library
  • Example: Using Random Forests with Spark MLlib
  • The Output (the “Confusion” matrix)
  • Dumping the Trained Model
  • Clustering
  • Finding Centroids Example
  • Using kMeans Module with Spark MLlib
  • Printing the Centroids
  • GraphX

 

P.IVA 06249920965
C.C.I.A.A. REA: MI - 1880014
Cap. Soc. € 12.000,00

Contatti

Viale Premuda n. 14 ,20129 Milano
Questo indirizzo email è protetto dagli spambots. È necessario abilitare JavaScript per vederlo.
Tel.: +39 02 50030 724
Fax.: +39 02 50030 725

© Copyright DI.GI. Academy
Privacy Policy | Cookie Policy