gammaticatech

Big Data - Intermediate

Learning Format

Online mode

Total training duration

80-90 hrs (2 months)

Syllabus

8 weeks

Certification

Yes

Big Data Engineering – Intermediate

This course builds on the basics of Big Data and focuses on practical, real-world data engineering skills. You’ll learn to design scalable data pipelines, process large datasets, and work with tools like Hadoop, Spark, Kafka, and NoSQL databases. The course also covers ETL workflows, cloud data platforms, and data security.

Syllabus Summary

Spark Fundamentals

  • Apache Spark architecture (RDD, DAG, Catalyst optimizer)
  •  Spark installation / cluster basics
  • RDD operations (map, filter, reduce, flatMap)
  • Hands-on: WordCount with RDDs

DataFrames in Spark

  • DataFrames vs RDDs
  •  Creating DataFrames from CSV/JSON
  • Column operations & filtering
  • Hands-on: DataFrame transformations

SparkSQL Basics

  • Running SQL queries inside Spark
  • SELECT, WHERE, Joins, GroupBy
  • Optimizations with Catalyst & Tungsten
  • Hands-on: SparkSQL on sales dataset

Advanced SparkSQL

  • Window functions (ROW_NUMBER, RANK)
  • Aggregations with multiple columns
  • Case Study: Customer behavior analysis

PySpark for ETL

  • PySpark DataFrame API
  • Data ingestion, transformation, writing back
  • Cleaning & deduplication at scale
  • Hands-on: PySpark ETL project

Databricks Introduction

  • Databricks clusters & notebooks
  • Writing jobs in Databricks
  • Managing jobs & versioning notebooks
  • Mini Project: ETL pipeline on Databricks

Delta Lake Concepts

  • Why Delta Lake?
  •  Time Travel & Schema Enforcement
  • Implementing upserts & deletes with Delta
  •  Hands-on: Delta Lake integration with PySpark

AWS S3 Integration & Wrap-Up

  • AWS S3 as a Data Lake
  • Reading/writing Spark data to S3
  • Securing access with IAM
  • Module recap + Mock Interview 2 (Spark + Databricks + S3)

Course Summary

Eligibility

Tech & Non-Tech Working professional, Freshers, Graduate from any domain.

Live Doubt Solving

Get your queries solved with daily dedicated doubts solving sessions.

Instructor

Experts and trainer for top-tech companies.

Certification

10+ ISO Globally recognized certified

Mode of Learning

100% Live Learning with experienced instructors and hands-on sessions.

Real time projects

Get practical experience with real-world projects for a career in analytics.

Certification

Scroll to Top