Databyte Academy

About the Program

About The Certified Big Data Analytics Program

Big Data is data that is too large and complex for conventional data tools to capture, store and analyze. When put to good use, Big Data allows analysts to spot trends, extract insights and make predications.

This course developed by our industry experts will help you develop core competency expected in a Big Data Analytics, skilled at effectively mining, manipulating, and analyzing Big Data, using basic and advanced analytical techniques.

A completely industry relevant Big Data Analytics training and a great blend of analytics and technology, making it quite apt for aspirants who want to develop Big Data Analytics skills and head-start in Big Data.

Course Objective

The objective of the course is to understand big data and how to store, manage and process big data using big data technologies like Hadoop & Hadoop Ecosystem. In this Big Data training, attendees will gain practical skill set on Hadoop & Hadoop Ecosystem in detail, like HDFS, MapReduce, Spark (Core, SQL, MLLIB, Graphx), Pig, Hive, Impala, HBase, Sqoop, Flume, Oozie, Zookeeper, Spark and Storm.

The course will also include Spark, along with hands-on integration of Hadoop with Spark. An introduction to machine learning will also then be included.

At the end of the program candidates are awarded Certified Big Data Analyst on successful completion of projects that are provided as part of the training. Optionally, candidates can also appear for the Cloudera or Hortonworks Big Data Hadoop certification after this course.

This course will encompass all to help you emerge as an Industry ready professional in the field of Big Data Analytics

Who should do this course?

Candidates from various quantitative backgrounds, like Engineering, Finance, Maths, Statistics, Business Management who want to head start their career in analytics. IT/ ITES, data analytics, Business Intelligence, Database professionals/ computer science (or any other circuit branches) who want to get into a Big Data Analytics/ Developer role.

Who are the trainers?

Our trainers are highly qualified industry experts and certified instructors with more than 10 years of global analytical experience.

Prerequisites

Knowledge of excel is mandatory and a quantitative background is preferred. Knowledge of any programming & data analytics exposure would be an advantage.

Project - Case Studies

Data storage using HDFS
This case study aims to give practical experience on Storing & managing different types of data(Structured/Semi/Unstructured) – both compressed and un-compressed.

Processing data using map reduce
This case study aims to give practical experience on understanding & developing Map reduce programs in JAVA & R and running streaming job in terminal & Ecclipse

Data integration using sqoop & flume
This case study aims to give practical experience on Extracting data from Oracle and load into HDFS and vice versa also Extracting data from twitter and store in HDFS

Data Analysis using Pig
This case study aims to give practical experience on complete data analysis using pig and create and usage of user defined function (UDF)

Data Analysis using Hive
This case study aims to give practical experience on complete data analysis using Hive and create and usage of user defined function (UDF)

Hbase-NoSql data base creation
This case study aims to give practical experience on Data table/cluster creation using Hbase

Final Project : Integration of Hadoop components
The final project aims to give practical experience on how different modules(Pig-Hive-Hbase) can be used for solving big data problems

I would like to know more

""
1
Nameyour full name
no-icon
Contact Number
no-icon
SubjectSubject
no-icon
Messagemore details
0 /
keyboard_arrow_leftPrevious
Nextkeyboard_arrow_right

Databyte Academy Instructors

Learn from practitioners, not from trainers.

SUMEET BANSAL

CEO & Co-Founder of Analytixlabs

Sumeet is a former Business Consultant and has worked with prestigious companies like McKinsey & Company, ZS Associates and AbsolutData in the past 8 years. He has worked in more than 10 countries across the globe and is an expert in Business and Big Data Analytics.


CHANDRA MOULI

Chief Data Scientist

Chandra Mouli is a former Business Consultant/Data Scientist and has worked with prestigious companies like McKinsey, and Genpact in the past 10 years. He has worked for clients across the globe and is an expert in Business and Big Data Analytics.


Ankita Gupta

Principal Consultant

Ankita is a former Analytics Consultant and has worked with prestigious companies like McKinsey & Company and Fidelity Investments in the past 10 years. She has worked in more than 10 countries across the globe and is an expert in Business and Marketing Analytics.


Sunit Prasad

Data Scientist

Sunit Prasad is a Senior Consultant and has worked on various projects in Analytics from Banking and Insurance domains, Risk Analytics, Social Media Analytics and Software Development in the past 5 years.


Ankur Agarwal

Analytics Consultant

A Consultant, worked with multinational companies like NIIT Tech (India, USA), EXL Inductis & IMS Health in the past 6 years for Fortune 50 clients across different geographies like USA, Italy as an expert in Statistical Business analysis and Data Analytics.


Sunit Prasad

Data Analytics

Manuj is an Analytics Consultant and he has worked with prestigious companies like American Express, Eclerx Services and Infosys. In past 8 years he has worked for clients across the globe and is an expertise in Business and Data Analytics.


Arun Pawar

Data Analytics

Arun Pawar is an Analytics Consultant and he has more than 11 years of corporate experience and worked with prestigious companies like Tech Mahindra, Accenture, British Telecom, Evalueserve and Mercer. He has worked for clients across the globe and is an expertise in Business and Data Analytics.

Exam & Certification

The certification is provided by Databyte Academy

Upon successful completion of the program, students will be conferred with dual certification:

  1. Certificate of Completion
  2. CERTIFIED BIG DATA ANALYTICS*

In order to be “Certified” as part of the course, students need to complete the assignments and examination. Once all your assignments are submitted and evaluated, the certificate shall be awarded.

To be commenced soon

Certified Big Data Analytics
Course ID – CBDA
Duration – 40 Hours
Classes – 5 Days
Tools – HADOOP & SPARK
Learning Mode – Instructor Led-Classroom Training
Next Batch – To be commenced soon

Course Outcome

Ability to understand big data and use Big Data Ecosystem tools store and process the big data. Also get hands on exposure on how to use big data technology to improve performance across functions by storing, managing and processing big data in efficient manner

Course Content

The field of data analysis, as the name implies, analyses data to discover trends. It has tremendous uses not only in the economics and financial sector but fields like law, healthcare, public administration, politics, telecom, social media, manufacturing, banking & financial institutions etc. who rely on quality data analysis to arrive at strategic business decisions. Working professionals can definitely improve their resume and their job prospects by achieving a certificate in data analytics.

Introduction to Big Data

  1. Introduction and relevance
  2. Uses of Big Data analytics in various industries like Telecom, E- commerce, Finance and Insurance etc.
  3. Problems with Traditional Large-Scale Systems

Hadoop (Big Data) Ecosystem

  1. Motivation for Hadoop
  2. Different types of projects by Apache
  3. Role of projects in the Hadoop Ecosystem
  4. Key technology foundations required for Big Data
  5. Limitations and Solutions of existing Data Analytics Architecture
  6. Comparison of traditional data management systems with Big Data management systems
  7. Evaluate key framework requirements for Big Data analytics
  8. Hadoop Ecosystem & Hadoop 2.x core components
  9. Explain the relevance of real-time data
  10. Explain how to use big and real-time data as a Business planning tool

Hadoop Cluster- Architecture-Configuration File

  1. Hadoop Master-Slave Architecture
  2. The Hadoop Distributed File System – Concept of data storage
  3. Explain different types of cluster setups(Fully distributed/Pseudo etc)
  4. Hadoop cluster set up – Installation
  5. Hadoop 2.x Cluster Architecture
  6. A Typical enterprise cluster – Hadoop Cluster Modes
  7. Understanding cluster management tools like Cloudera manager/Apache Ambari

Hadoop Core Components-HDFS & Mapreduce (Yarn)

  1. HDFS Overview & Data storage in HDFS
  2. Get the data into Hadoop from local machine(Data Loading Techniques) – vice versa
  3. Map Reduce Overview (Traditional way Vs. MapReduce way) Concept of Mapper & Reducer
  4. Understanding MapReduce program Framework
  5. Develop MapReduce Program using Java (Basic)
  6. Develop MapReduce program with streaming API) (Basic)

Data Integration Using Sqoop & Flume

  1. Integrating Hadoop into an Existing Enterprise
  2. Loading Data from an RDBMS into HDFS by Using Sqoop
  3. Managing Real-Time Data Using Flume
  4. Accessing HDFS from Legacy Systems

Data Analysis Using PIG

  1. Introduction to Data Analysis Tools
  2. Apache PIG – MapReduce Vs Pig, Pig Use Cases
  3. PIG’s Data Model
  4. PIG Streaming
  5. Pig Latin Program & Execution
  6. Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Pig UDF
  7. Writing JAVA UDF’s
  8. Embedded PIG in JAVA
  9. PIG Macros
  10. Parameter Substitution
  11. Use Pig to automate the design and implementation of MapReduce applications
  12. Use Pig to apply structure to unstructured Big Data

Data Analysis Using HIVE

  1. Apache Hive – Hive Vs. PIG – Hive Use Cases
  2. Discuss the Hive data storage principle
  3. Explain the File formats and Records formats supported by the Hive environment
  4. Perform operations with data in Hive
  5. Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts
  6. Hive Script, Hive UDF
  7. Hive Persistence formats
  8. Loading data in Hive – Methods
  9. Serialization & Deserialization
  10. Handling Text data using Hive
  11. Integrating external BI tools with Hadoop Hive

Data Analysis Using IMPALA

  1. Introduction to Impala & Architecture
  2. How Impala executes Queries and its importance
  3. Hive vs. PIG vs. Impala
  4. Extending Impala with User Defined functions

Introduction to Other Ecosystem Tools

  • NoSQL database – Hbase Introduction Oozie

SPARK Introduction

  1. Introduction to Apache Spark
  2. Streaming Data Vs. In Memory Data
  3. Map Reduce Vs. Spark
  4. Modes of Spark
  5. Spark Installation Demo
  6. Overview of Spark on a cluster
  7. Spark Standalone Cluster

SPARK in Practice

  1. Invoking Spark Shell
  2. Creating the Spark Context
  3. Loading a File in Shell
  4. Performing Some Basic Operations on Files in Spark Shell
  5. Caching Overview
  6. Distributed Persistence
  7. Spark Streaming Overview(Example: Streaming Word Count)

SPARK meets HIVE

  1. Analyze Hive and Spark SQL Architecture
  2. Analyze Spark SQL
  3. Context in Spark SQL
  4. Implement a sample example for Spark SQL
  5. Integrating Hive and Spark SQL
  6. Support for JSON and Parquet File Formats Implement Data Visualization in Spark
  7. Loading of Data
  8. Hive Queries through Spark
  9. Performance Tuning Tips in Spark
  10. Shared Variables: Broadcast Variables & Accumulators

SPARK streaming

  1. Extract and analyze the data from twitter using Spark streaming
  2. Comparison of Spark and Storm – Overview

SPARK GraphX

  1. Overview of GraphX module in spark
  2. Creating graphs with GraphX

Implement Machine Learning Using Spark

  1. Brief introduction to Machine learning framework
  2. Implement some of the ML algorithms using Spark MLLib (ML is not covered in detail in this course, for Machine Learning concept pls refer to Advance Big Data Science course or Machine Learning Specialization cours

Final Project

  1. Consolidate all the learnings
  2. Working on Big Data Project by integrating various key components

I would like to know more

""
1
Nameyour full name
no-icon
Contact NumberContact Number
no-icon
Subject
no-icon
Messagemore details
0 /
keyboard_arrow_leftPrevious
Nextkeyboard_arrow_right

Get Ahead with Databyte’s Certificate

Earn your Certificate

Our Certified Big Data Analytics program is exhaustive and this certificate is proof that you have taken a big leap in mastering the domain.

Differentiate yourself with a Certificated Big Data Analytics

The knowledge and skills you’ve gained working on projects, simulations, case studies will set you ahead of competition.

Share your achievement

Talk about it on Linkedin, Twitter, Facebook, boost your resume or frame it – tell your friends and colleagues about it.

Learning Path

Login Form

Register Form