gammaticatech

Python for Data Engineering

Learning Format

Live Online / Classroom

Total training duration

120 hrs

Syllabus

12 weeks

Certification

Yes

Python for Data Engineering

Python for Data Engineering focuses on using Python to collect, process, and manage large datasets efficiently. It involves working with libraries like Pandas, NumPy, and PySpark for data manipulation and transformation. Data engineers use Python scripts to automate ETL (Extract, Transform, Load) pipelines and integrate data from multiple sources such as APIs and databases. Tools like Airflow and SQLAlchemy help in workflow automation and data management.

Syllabus Summary

  • ETL basics
  • Assignment → Parse CSV → JSON
  •  SQL basics- DB connection
  • Assignment → CRUD ops
  • SQLAlchemy ORM
  • Assignment → Employee DB queries
  •  Batch processing with Pandas
  • Assignment → Dataset cleanup- Mock Interview 1
  •  PySpark basics-
  • Assignment → Read big dataset
  • PySpark transformations
  • Assignment → Transformation pipeline
  • PySpark SQL- Assignment → Sales joins Week 8
  • Airflow basics- DAGs & scheduling-
  • Assignment → ETL DAG- Mock Interview 2
  • AWS Boto3 basics
  • Assignment → Store files in S3
  • AWS Boto3 basics-
  • Assignment → Store files in S3
  • Kafka basics
  • Assignment → Stream consumer
  • End-to-End ETL pipeline
  • Assignment → Full ETL run
  • Capstone Project: Cloud ETL
  • Mock Interview 3

Course Summary

Eligibility

Tech & Non-Tech Working professional, Freshers, Graduate from any domain.

Live Doubt Solving

Get your queries solved with daily dedicated doubts solving sessions.

Instructor

Experts and trainer for top-tech companies.

Certification

10+ ISO Globally recognized certified

Mode of Learning

100% Live Learning with experienced instructors and hands-on sessions.

Real time projects

Get practical experience with real-world projects for a career in analytics.

Certification

Scroll to Top