Germany

Neeraj Kumar Mn

Data Engineer | Cloud & Big Data Specialist

MSc Data Science · Arden University, Germany

Building scalable ETL pipelines and cloud-native data solutions that turn raw data into actionable insight.

Download CV
Neeraj Kumar Mn — Data Engineer

Open to work

Neeraj Kumar Mn

About Me

Turning raw data into reliable insight

Data Engineer with around 2 years of experience in Python, SQL, and PySpark, building and optimising scalable ETL pipelines and cloud analytics solutions on AWS and Azure. Currently pursuing an MSc in Data Science at Arden University, Germany. Passionate about automation, cloud-native architectures, and delivering reliable data infrastructure that engineering and analytics teams can depend on.

MSc Data Science

Arden University, Germany · Feb 2025 – Jun 2026

  • Around 2 years of professional Data Engineering experience
  • Expert in Python, PySpark & SQL
  • AWS & Azure cloud infrastructure
  • Based in Germany — open to remote & hybrid roles

95%

Automation improvement

100K+

Records processed

35%

Performance optimisation

Tech Stack

Skills & Technologies

Programming

Python
SQL

Big Data

Apache Spark
PySpark
Kafka
Hadoop

Cloud

AWS
Azure
Azure Databricks
Microsoft Fabric

Databases

PostgreSQL
MongoDB
MySQL
Snowflake

Tools

Apache Airflow
Azure Data Factory
Docker
Git

Visualisation

Power BI

Communication

Languages

English

C1

Professional Working

90%

German

B2

Upper Intermediate

65%

Career

Work Experience

Junior Data Engineer

Upconnect Labs LLP

Jun 2024 – May 2025Bangalore, India
  • Designed and maintained scalable data pipelines using Python, SQL, and PySpark to support analytics and reporting needs.
  • Optimized SQL-based analytical queries by improving join logic and indexing, reducing execution time by 25–35%.
  • Stabilized and improved production-grade PySpark pipelines in Azure Databricks, resolving failures and ensuring consistent data processing.
  • Collaborated with analysts and stakeholders to translate reporting requirements into Power BI dashboards, reducing manual reporting workload by 40% and improving KPI visibility.
  • Implemented data transformation and cleansing logic to improve dataset consistency and ensure analytics-ready outputs.

Portfolio

Featured Projects

End-to-end data pipelines built on AWS and Azure showcasing scalable ETL and cloud analytics.

CarePlus AWS ETL Pipeline

Event-driven healthcare analytics pipeline on AWS that automates end-to-end data processing from raw ticket data through S3, Lambda, Glue, and Redshift, with query performance optimised via Parquet conversion.

PythonAWS S3AWS LambdaAWS GlueAWS RedshiftAWS Athena
Improved data quality by 35% via Glue ETL cleansing
Reduced ETL runtime by 50% with S3 event triggers
Lowered storage costs via Parquet conversion

Olist E-commerce Azure Pipeline

End-to-end Azure data pipeline ingesting 100K+ records from GitHub, MySQL, and MongoDB into ADLS Gen2 via Azure Data Factory, transforming to curated Silver datasets in Databricks, and delivering analytics in Azure Synapse.

Azure Data FactoryAzure DatabricksPySparkADLS Gen2Azure Synapse AnalyticsPythonSQLMongoDB
100K+ records ingested daily from 3 sources
Reduced manual integration by 95%
Enhanced data accuracy by 30% via PySpark deduplication

NYC Taxi Azure & Databricks Pipeline

Cloud-native pipeline extracting 10K+ monthly NYC Taxi records via REST API into ADLS Gen2, transformed through a Bronze–Silver–Gold Databricks lakehouse using PySpark and Delta Lake for downstream analytics.

Azure Data FactoryADLS Gen2Azure DatabricksPySparkDelta LakeSQLPythonREST APIs
10K+ records ingested monthly via API
Reduced manual effort by 90%
Delta Lake ensures data readiness for reporting

Credentials

Certifications

Big Data Engineering with Spark & Hadoop

Udemy

2025

SQL for Data Analysis

CodeBasics

2025

100 Days of Code: Python Bootcamp

Udemy / Angela Yu

2025

Get In Touch

Let's Connect

Open to Data Engineering roles, freelance projects, and collaborations. Drop a message!