Big Data and Spark
- Duration: 2 days
- Fee: Request
- Start Date: Request

Analyze Large Datasets
Distributed storage and processing is needed to handle Big Data. Apache Spark is currently the most effective tool to distribute machine learning processing. To be able to use Apache Spark, one needs both to understand how its architecture and design as well as how to use it practically.
Learning outcome:
- Know what big data means
- Know what cloud computing means and how to use it
- Understand the fundamentals of distributed storage and computing
- Ability to execute process big data on a Spark cluster from Python (using pySpark)
Who should attend:
2 days of in depth learning
Face to face with experienced Data Scientist.
Course Methodology
This course will utilize a combination of Presentations and Workshops.
CADS Certification
Earn certification upon completion.
Python Programming II, Database Management Systems
Undergraduate Degree
Training Track
Enterprise Data Scientist (EDS)
Big Data and Spark is one of the modules under our Enterprise Data Scientist (EDS) programme. EDS is a 42-day training program that provides participants with the tools to be key leaders and contributors of a data science team and be able to analyze data to drive informed business decisions.

Details of Subject
- On-
- Introduction to Big Data
- Apache Hadoop overview
- HDFS architecture
- Distributed processing
- Hadoop, map reduce
- RDDs (Resilient Distributed Datasets)
- Apache Spark
- Cloud Computing – Introduction to cloud computing platform: AWS (Amazon Web Services), GCP (Google Cloud Storage), Microsoft Azure
- DataFrames and Spark SQL –
- Creating, transforming DataFrames. Groupby, aggregate functions
- DataFrames and RDDs
- Spark Mllib –
- Introduction to Machine Learning
- Using Machine Learning with Spark
Lead Instructor

CADS Certification
EDS CADS Certified Enterprise Data Scientist
Certification information for this module & track will be made available soon.

Hear from Our Alumni
Enterprise Program – Data Analyst
Enterprise Program – Data Scientist
Data Storytelling Course
Enterprise Program – Data Scientist
Enterprise Program – Data Analyst
Register Interest
Big Data and Spark