Neeraj Kumar Mn
Data Engineer | Cloud & Big Data Specialist
Building scalable ETL pipelines and cloud-native data solutions that turn raw data into actionable insight.

Open to work

About Me
Turning raw data into reliable insight
Data Engineer with around 2 years of experience in Python, SQL, and PySpark, building and optimising scalable ETL pipelines and cloud analytics solutions on AWS and Azure. Currently pursuing an MSc in Data Science at Arden University, Germany. Passionate about automation, cloud-native architectures, and delivering reliable data infrastructure that engineering and analytics teams can depend on.
MSc Data Science
Arden University, Germany · Feb 2025 – Jun 2026
- Around 2 years of professional Data Engineering experience
- Expert in Python, PySpark & SQL
- AWS & Azure cloud infrastructure
- Based in Germany — open to remote & hybrid roles
95%
Automation improvement
100K+
Records processed
35%
Performance optimisation
Tech Stack
Skills & Technologies
Programming
Big Data
Cloud
Databases
Tools
Visualisation
Communication
Languages
English
C1Professional Working
90%
German
B2Upper Intermediate
65%
Career
Work Experience
Junior Data Engineer
Upconnect Labs LLP
- Designed and maintained scalable data pipelines using Python, SQL, and PySpark to support analytics and reporting needs.
- Optimized SQL-based analytical queries by improving join logic and indexing, reducing execution time by 25–35%.
- Stabilized and improved production-grade PySpark pipelines in Azure Databricks, resolving failures and ensuring consistent data processing.
- Collaborated with analysts and stakeholders to translate reporting requirements into Power BI dashboards, reducing manual reporting workload by 40% and improving KPI visibility.
- Implemented data transformation and cleansing logic to improve dataset consistency and ensure analytics-ready outputs.
Portfolio
Featured Projects
End-to-end data pipelines built on AWS and Azure showcasing scalable ETL and cloud analytics.
Event-driven healthcare analytics pipeline on AWS that automates end-to-end data processing from raw ticket data through S3, Lambda, Glue, and Redshift, with query performance optimised via Parquet conversion.
End-to-end Azure data pipeline ingesting 100K+ records from GitHub, MySQL, and MongoDB into ADLS Gen2 via Azure Data Factory, transforming to curated Silver datasets in Databricks, and delivering analytics in Azure Synapse.
Cloud-native pipeline extracting 10K+ monthly NYC Taxi records via REST API into ADLS Gen2, transformed through a Bronze–Silver–Gold Databricks lakehouse using PySpark and Delta Lake for downstream analytics.
Credentials
Certifications
Big Data Engineering with Spark & Hadoop
Udemy
2025
SQL for Data Analysis
CodeBasics
2025
100 Days of Code: Python Bootcamp
Udemy / Angela Yu
2025
Get In Touch
Let's Connect
Open to Data Engineering roles, freelance projects, and collaborations. Drop a message!