The Big Data Omnibus: Hadoop, Spark, Storm and QlikView

Hadoop, Spark, Storm and Qlikview

What's Inside

This omnibus course covers four powerful technologies:

1. Hadoop

2. Spark

3. Apache Storm

4. QlikView

Hadoop

  • MapReduce : Mapper, Reducer, Sort/Merge, Partitioning, Shuffle and Sort

  • HDFS & YARN: Namenode, Datanode, Resource manager, Node manager, the anatomy of a MapReduce application, YARN Scheduling, Configuring HDFS and YARN to performance tune your cluster.
  • Build your Hadoop cluster:
    • Install Hadoop in Standalone, Pseudo-Distributed and Fully Distributed modes
    • Set up a hadoop cluster using Linux VMs.
    • Set up a cloud Hadoop cluster on AWS with Cloudera Manager.
    • Understand HDFS, MapReduce and YARN and their interaction

  • Customize your MapReduce Jobs

Spark

What's Spark? If you are an analyst or a data scientist, you're used to having multiple systems for working with data. SQL, Python, R, Java, etc. With Spark, you have a single engine where you can explore and play with large amounts of data, run machine learning algorithms and then use the same system to productionize your code.

Analytics: Using Spark and Python you can analyze and explore your data in an interactive environment with fast feedback. The course will show how to leverage the power of RDDs and Dataframes to manipulate data with ease.

Apache Storm

Storm is to real-time stream processing what Hadoop is to batch processing. Using Storm you can build applications which need you to be highly responsive to the latest data and react within seconds and minutes, such as finding the latest trending topics on twitter, or monitoring spikes in payment gateway failures. From simple data transformations to applying machine learning algorithms on the fly, Storm can do it all.

This course has 25 Solved Examples on building Storm Applications.

What's covered?

1) Understanding Spouts and Bolts which are the building blocks of every Storm topology.

2) Running a Storm topology in the local mode and in the remote mode

3) Parallelizing data processing within a topology using different grouping strategies : Shuffle grouping, fields grouping, Direct grouping, All grouping, Custom Grouping

4) Managing reliability and fault-tolerance within Spouts and Bolts

5) Performing complex transformations on the fly using the Trident topology : Map, Filter, Windowing and Partitioning operations

6) Applying ML algorithms on the fly using libraries like Trident-ML and Storm-R.

QlikView

A Qlikview app is like an in-memory database. The interactive nature of Qlikview allows you to explore and iterate very quickly to develop an intuitive feel for your data.

What's covered?

1) The Qlikview In-memory data model

2) Use List boxes, Table boxes and Chart boxes to query data

3) Load data into a QV app from CSV and Databases, avoiding Synthetic keys and Circular references

4) Transform and adding new fields in a Load script

5) Transform tables with Join, Keep

6) Effectively present your insights using elements like charts, drill downs, triggers.

7) Nested aggregations in charts

8) Generic Loads, Mapping Loads, Crosstable


Course Curriculum

Get started now!



Certificate Available
45219+ Students
120 Lectures
17+ Hours of Video
Lifetime Access
24/7 Support
Instructor Rating
Loonycorn

Loonycorn is comprised of a couple of individuals —Janani Ravi and Vitthal Srinivasan—who have honed their tech expertises at Google and Stanford. The team believes it has distilled the instruction of complicated tech concepts into funny, practical, engaging courses, and is excited to be sharing its content with eager students.

Popular Bundles