Taming Big Data with Spark Streaming and Scala - Hands On!

Learn to process massive streams of data in real time on a cluster with Spark Streaming.

What's Inside

New for 2022: updated to the latest IntelliJ IDE!

"Big Data" analysis is a hot and highly valuable skill. Thing is, "big data" never stops flowing! Spark Streaming is a new and quickly developing technology for processing massive data sets as they are created - why wait for some nightly analysis to run when you can constantly update your analysis in real time, all the time? Whether it's clickstream data from a big website, sensor data from a massive "Internet of Things" deployment, financial data, or something else - Spark Streaming is a powerful technology for transforming and analyzing that data right when it is created, all the time.

You'll be learning from an ex-engineer and senior manager from Amazon and IMDb.

This course gets your hands on to some real live Twitter data, simulated streams of Apache access logs, and even data used to train machine learning models! You'll write and run real Spark Streaming jobs right at home on your own PC, and toward the end of the course, we'll show you how to take those jobs to a real Hadoop cluster and run them in a production environment too.

Across over 30 lectures and almost 6 hours of video content, you'll:

Get a crash course in the Scala programming language
Learn how Apache Spark operates on a cluster
Set up discretized streams with Spark Streaming and transform them as data is received
Use structured streaming to stream into dataframes in real-time
Analyze streaming data over sliding windows of time
Maintain stateful information across streams of data
Connect Spark Streaming with highly scalable sources of data, including Kafka, Flume, and Kinesis
Dump streams of data in real-time to NoSQL databases such as Cassandra
Run SQL queries on streamed data in real time
Train machine learning models in real time with streaming data, and use them to make predictions that keep getting better over time
Package, deploy, and run self-contained Spark Streaming code to a real Hadoop cluster using Amazon Elastic MapReduce.

This course is very hands-on, filled with achievable activities and exercises to reinforce your learning. By the end of this course, you'll be confidently creating Spark Streaming scripts in Scala, and be prepared to tackle massive streams of data in a whole new way. You'll be surprised at how easy Spark Streaming makes it!

Course Curriculum

Getting Started

A Crash Course in Scala

Spark Streaming Concepts

Spark Streaming Examples with Twitter

Spark Streaming Examples with Clickstream / Apache Access Log Data

Integrating with Other Systems

Advanced Spark Streaming Examples

Spark Streaming in Production

You Made It!

Learning More
Start

Get started now!

Certificate Available

868+ Students

35 Lectures

0+ Hours of Video

Lifetime Access

24/7 Support

Instructor Rating

Frank Kane

Frank Kane spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

Popular Bundles

Best Seller

The All-in-One Super-Sized Ethical Hacking Bundle

Newly Launched

The Complete 2026 Beginner to Expert Guitar Lessons Bundle

Unlimited Learning

Expand Your Mind with StackSkills Unlimited