Size: 3.53 MB
Hadoop, MapReduce, HDFS, Spark, Pig, Hive, HBase, MongoDB, Cassandra, Flume – the list goes on! Over 25 technologies.
What Will I Learn?
- Design distributed systems that manage “big data” using Hadoop and related technologies.
Use HDFS and MapReduce for storing and analyzing data at scale.
Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.
- Analyze relational data using Hive and MySQL
- Analyze non-relational data using HBase, Cassandra, and MongoDB
- Query data interactively with Drill, Phoenix, and Presto
- Choose an appropriate data storage technology for your application
- Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.
- Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume
- Consume streaming data using Spark Streaming, Flink, and Storm
- You will need access to a PC running 64-bit Windows, MacOS, or Linux with an Internet connection, if you want to participate in the hands-on activities and exercises. You must have at least 8GB of free RAM on your system; 10GB or more is recommended. If your PC does not meet these requirements, you can still follow along in the course without doing hands-on activities.
- Some activities will require some prior programming experience, preferably in Python or Scala.
- A basic familiarity with the Linux command line will be very helpful.
DescriptionThe world of Hadoop and “Big Data” can be intimidating – hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this course, you’ll not only understand what those systems are and how they fit together – but you’ll go hands-on and learn how to use them to solve real business problems! Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We’ll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.
- Install and work with a real Hadoop installation right on your desktop with Hortonworks and the Ambari UI
- Manage big data on a cluster with HDFS and MapReduce
- Write programs to analyze data on Hadoop with Pig and Spark
- Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto
- Design real-world systems using the Hadoop ecosystem
- Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue
- Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm
- Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend “big data” at scale.
- Project, program, or product managers who want to understand the lingo and high-level architecture of Hadoop.
- Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
- System architects who need to understand the components available in the Hadoop ecosystem, and how they fit together.