gammaticatech

Big Data- Basics

Learning Format

Online mode

Total training duration

80-90 hrs (2 months)

Syllabus

8 weeks

Certification

Yes

Big Data Engineering – Basics

Big Data Engineering is the process of designing, building, and managing systems that collect, store, and analyze large volumes of data. In today’s digital world, organizations generate massive amounts of data every second—from social media, sensors, transactions, and applications. Big Data Engineers create the pipelines and infrastructure that make it possible to transform this raw data into meaningful insights.

Syllabus Summary

Big Data Foundations-

  • What is Big Data? Batch vs Streaming processing
  • Hadoop ecosystem overview (HDFS, YARN, Hive, Spark)
  • Why Spark over MapReduce?
  • Industry use cases of Big Data

SQL Basics (Part 1)-

  • RDBMS concepts refresher
  • SQL DDL, DML (CREATE, INSERT, UPDATE, DELETE)-
  • SELECT statements & WHERE filters
  • ORDER BY, LIMIT usage
  • Hands-on: Run SQL queries on sample datasets

SQL Basics (Part 2)-

  • Aggregations: SUM, COUNT, AVG, MIN, MAX
  • GROUP BY, HAVING
  • Combining filters with aggregations
  • Case Study: Retail/Banking dataset analysis

SQL Joins

  • INNER, LEFT, RIGHT, FULL Joins
  • Hands-on: Joining Orders + Customers dataset
  • Joins with multiple table
  • Business case study queries

Introduction to Hive

  • Hive architecture & components (Metastore, HDFS)
  • Difference between SQL & HiveQL
  • Creating & loading Hive tables
  • Hands-on: Query structured data in Hive

Hive Queries & Integrations

  • Hive DDL/DML commands
  • Partitioning & Bucketing basics
  • Performance considerations in Hive
  • Mini Project: Sales dataset analysis in Hive

Python for Data Engineering

  • Python syntax, loops, functions, file handling
  • Pandas & NumPy basics for analysis
  • Hands-on: Cleaning CSV/JSON with Pandas

Intro to PySpark & Wrap-Up

  •  What is PySpark? Scaling Python to Big Data
  • Creating Spark DataFrames
  • Simple transformations (select, filter, withColumn)
  • Module recap & review
  • Mock Interview 1 (SQL + Hive + Python basics)

Course Summary

Eligibility

Tech & Non-Tech Working professional, Freshers, Graduate from any domain.

Live Doubt Solving

Get your queries solved with daily dedicated doubts solving sessions.

Instructor

Experts and trainer for top-tech companies.

Certification

10+ ISO Globally recognized certified

Mode of Learning

100% Live Learning with experienced instructors and hands-on sessions.

Real time projects

Get practical experience with real-world projects for a career in analytics.

Certification

Scroll to Top