About the Program
About The Certified Big Data Analytics Program
Big Data is data that is too large and complex for conventional data tools to capture, store and analyze. When put to good use, Big Data allows analysts to spot trends, extract insights and make predications.
This course developed by our industry experts will help you develop core competency expected in a Big Data Analytics, skilled at effectively mining, manipulating, and analyzing Big Data, using basic and advanced analytical techniques.
A completely industry relevant Big Data Analytics training and a great blend of analytics and technology, making it quite apt for aspirants who want to develop Big Data Analytics skills and head-start in Big Data.
Course Objective
The objective of the course is to understand big data and how to store, manage and process big data using big data technologies like Hadoop & Hadoop Ecosystem. In this Big Data training, attendees will gain practical skill set on Hadoop & Hadoop Ecosystem in detail, like HDFS, MapReduce, Spark (Core, SQL, MLLIB, Graphx), Pig, Hive, Impala, HBase, Sqoop, Flume, Oozie, Zookeeper, Spark and Storm.
The course will also include Spark, along with hands-on integration of Hadoop with Spark. An introduction to machine learning will also then be included.
At the end of the program candidates are awarded Certified Big Data Analyst on successful completion of projects that are provided as part of the training. Optionally, candidates can also appear for the Cloudera or Hortonworks Big Data Hadoop certification after this course.
This course will encompass all to help you emerge as an Industry ready professional in the field of Big Data Analytics
Who should do this course?
Candidates from various quantitative backgrounds, like Engineering, Finance, Maths, Statistics, Business Management who want to head start their career in analytics. IT/ ITES, data analytics, Business Intelligence, Database professionals/ computer science (or any other circuit branches) who want to get into a Big Data Analytics/ Developer role.
Who are the trainers?
Our trainers are highly qualified industry experts and certified instructors with more than 10 years of global analytical experience.
Prerequisites
Knowledge of excel is mandatory and a quantitative background is preferred. Knowledge of any programming & data analytics exposure would be an advantage.
Project - Case Studies
Data storage using HDFS
This case study aims to give practical experience on Storing & managing different types of data(Structured/Semi/Unstructured) – both compressed and un-compressed.
Processing data using map reduce
This case study aims to give practical experience on understanding & developing Map reduce programs in JAVA & R and running streaming job in terminal & Ecclipse
Data integration using sqoop & flume
This case study aims to give practical experience on Extracting data from Oracle and load into HDFS and vice versa also Extracting data from twitter and store in HDFS
Data Analysis using Pig
This case study aims to give practical experience on complete data analysis using pig and create and usage of user defined function (UDF)
Data Analysis using Hive
This case study aims to give practical experience on complete data analysis using Hive and create and usage of user defined function (UDF)
Hbase-NoSql data base creation
This case study aims to give practical experience on Data table/cluster creation using Hbase
Final Project : Integration of Hadoop components
The final project aims to give practical experience on how different modules(Pig-Hive-Hbase) can be used for solving big data problems
Exam & Certification
The certification is provided by Databyte Academy
Upon successful completion of the program, students will be conferred with dual certification:
- Certificate of Completion
- CERTIFIED BIG DATA ANALYTICS*
In order to be “Certified” as part of the course, students need to complete the assignments and examination. Once all your assignments are submitted and evaluated, the certificate shall be awarded.
New Intake :
To be commenced soon
Certified Big Data Analytics
Course ID – CBDA
Duration – 40 Hours
Classes – 5 Days
Tools – HADOOP & SPARK
Learning Mode – Instructor Led-Classroom Training
Next Batch – Full Time
Course Outcome
Ability to understand big data and use Big Data Ecosystem tools store and process the big data. Also get hands on exposure on how to use big data technology to improve performance across functions by storing, managing and processing big data in efficient manner
Course Content
The field of data analysis, as the name implies, analyses data to discover trends. It has tremendous uses not only in the economics and financial sector but fields like law, healthcare, public administration, politics, telecom, social media, manufacturing, banking & financial institutions etc. who rely on quality data analysis to arrive at strategic business decisions. Working professionals can definitely improve their resume and their job prospects by achieving a certificate in data analytics.
Introduction to Big Data
- Introduction and relevance
- Uses of Big Data analytics in various industries like Telecom, E- commerce, Finance and Insurance etc.
- Problems with Traditional Large-Scale Systems
Hadoop (Big Data) Ecosystem
- Motivation for Hadoop
- Different types of projects by Apache
- Role of projects in the Hadoop Ecosystem
- Key technology foundations required for Big Data
- Limitations and Solutions of existing Data Analytics Architecture
- Comparison of traditional data management systems with Big Data management systems
- Evaluate key framework requirements for Big Data analytics
- Hadoop Ecosystem & Hadoop 2.x core components
- Explain the relevance of real-time data
- Explain how to use big and real-time data as a Business planning tool
Hadoop Cluster- Architecture-Configuration File
- Hadoop Master-Slave Architecture
- The Hadoop Distributed File System – Concept of data storage
- Explain different types of cluster setups(Fully distributed/Pseudo etc)
- Hadoop cluster set up – Installation
- Hadoop 2.x Cluster Architecture
- A Typical enterprise cluster – Hadoop Cluster Modes
- Understanding cluster management tools like Cloudera manager/Apache Ambari
Hadoop Core Components-HDFS & Mapreduce (Yarn)
- HDFS Overview & Data storage in HDFS
- Get the data into Hadoop from local machine(Data Loading Techniques) – vice versa
- Map Reduce Overview (Traditional way Vs. MapReduce way) Concept of Mapper & Reducer
- Understanding MapReduce program Framework
- Develop MapReduce Program using Java (Basic)
- Develop MapReduce program with streaming API) (Basic)
Data Integration Using Sqoop & Flume
- Integrating Hadoop into an Existing Enterprise
- Loading Data from an RDBMS into HDFS by Using Sqoop
- Managing Real-Time Data Using Flume
- Accessing HDFS from Legacy Systems
Data Analysis Using PIG
- Introduction to Data Analysis Tools
- Apache PIG – MapReduce Vs Pig, Pig Use Cases
- PIG’s Data Model
- PIG Streaming
- Pig Latin Program & Execution
- Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Pig UDF
- Writing JAVA UDF’s
- Embedded PIG in JAVA
- PIG Macros
- Parameter Substitution
- Use Pig to automate the design and implementation of MapReduce applications
- Use Pig to apply structure to unstructured Big Data
Data Analysis Using HIVE
- Apache Hive – Hive Vs. PIG – Hive Use Cases
- Discuss the Hive data storage principle
- Explain the File formats and Records formats supported by the Hive environment
- Perform operations with data in Hive
- Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts
- Hive Script, Hive UDF
- Hive Persistence formats
- Loading data in Hive – Methods
- Serialization & Deserialization
- Handling Text data using Hive
- Integrating external BI tools with Hadoop Hive
Data Analysis Using IMPALA
- Introduction to Impala & Architecture
- How Impala executes Queries and its importance
- Hive vs. PIG vs. Impala
- Extending Impala with User Defined functions
Introduction to Other Ecosystem Tools
- NoSQL database – Hbase Introduction Oozie
SPARK Introduction
- Introduction to Apache Spark
- Streaming Data Vs. In Memory Data
- Map Reduce Vs. Spark
- Modes of Spark
- Spark Installation Demo
- Overview of Spark on a cluster
- Spark Standalone Cluster
SPARK in Practice
- Invoking Spark Shell
- Creating the Spark Context
- Loading a File in Shell
- Performing Some Basic Operations on Files in Spark Shell
- Caching Overview
- Distributed Persistence
- Spark Streaming Overview(Example: Streaming Word Count)
SPARK meets HIVE
- Analyze Hive and Spark SQL Architecture
- Analyze Spark SQL
- Context in Spark SQL
- Implement a sample example for Spark SQL
- Integrating Hive and Spark SQL
- Support for JSON and Parquet File Formats Implement Data Visualization in Spark
- Loading of Data
- Hive Queries through Spark
- Performance Tuning Tips in Spark
- Shared Variables: Broadcast Variables & Accumulators
SPARK streaming
- Extract and analyze the data from twitter using Spark streaming
- Comparison of Spark and Storm – Overview
SPARK GraphX
- Overview of GraphX module in spark
- Creating graphs with GraphX
Implement Machine Learning Using Spark
- Brief introduction to Machine learning framework
- Implement some of the ML algorithms using Spark MLLib (ML is not covered in detail in this course, for Machine Learning concept pls refer to Advance Big Data Science course or Machine Learning Specialization cours
Final Project
- Consolidate all the learnings
- Working on Big Data Project by integrating various key components
Get Ahead with Databyte’s Certificate
Earn your Certificate
Our Certified Big Data Analytics program is exhaustive and this certificate is proof that you have taken a big leap in mastering the domain.
Differentiate yourself with a Certificated Big Data Analytics
The knowledge and skills you’ve gained working on projects, simulations, case studies will set you ahead of competition.
Share your achievement
Talk about it on Linkedin, Twitter, Facebook, boost your resume or frame it – tell your friends and colleagues about it.