A computer with decent memory & internet connection
Either Ubuntu, Mac OS, or Windows as an operating system
What is this course about: This course teaches you how to use the python bindings for Apache Spark’s data streaming capabilities. This course will be absolutely critical to anyone trying to make it in data science today.
What will you learn from this lecture: In this couse, you’ll learn how to use Apache Spark for data streaming, and how to use it wth the lingua franca of data science: Python. You’ll see demos of how to handle and manipulate many different types of data, as well as get hands on experience with exercises, such as making a Twitter analytics tool.
You’ll also learn how to use PySpark with other popular streaming tools like Apache Kafka (used by Fortune 500 companies like LinkedIn for their data Streaming) and AWS tools like Kinesis.
Why should you learn Apache Spark streaming: Spark streaming is becoming incredibly popular, and with good reason. According to IBM, Ninety percent of the data in the world today has been created in the last two years alone. Our current output of data is roughly 2.5 quintillion bytes per day. The world is being immersed in data, moreso each and every day.
As such, analyzing static dataframes of non-dynamic data becomes the less practical approach to more and more problems. This is where data streaming comes in, the ability to process data almost as soon as it’s produced, recognizing the time-dependency of the data.
What programming language is this course taught in: Python 3 (with heavy use of Jupyter Notebooks)
Who is the target audience?
Python Developers looking to get better at Data Streaming
Managers or Senior Engineers in Data Engineering Teams