Size: 11.49 GB
Get processing Big Data using RDDs, DataFrames, SparkSQL and Machine Learning – and real time streaming with Kafka!
What you’ll learn
- Use functional style Java to define complex data processing jobs
Learn the differences between the RDD and DataFrame APIs
Use an SQL style syntax to produce reports against Big Data sets
- Use Machine Learning Algorithms with Big Data and SparkML
- Connect Spark to Apache Kafka to process Streams of Big Data
- See how Structured Streaming can be used to build pipelines with Kafka
- Java 8 is required for the course. Spark does not currently support Java9+, and you need Java 8 for the functional Lambda syntax
- Previous knowledge of Java is assumed, but anything above the basics is explained
- Some previous SQL will be useful for part of the course, but if you’ve never used it before this will be a good first experience
Who this course is for:
- Anyone who already knows Java and would like to explore Apache Spark
- Anyone new to Data Science who want a fast way to get started, without learning Python, Scala or R!