Germany

Neeraj Kumar Mn

Data Engineer | Cloud & Big Data Specialist

MSc Data Science · Arden University, Germany

Building scalable ETL pipelines and cloud-native data solutions that turn raw data into actionable insight.

Download CV

Open to work

About Me

Turning raw data into reliable insight

Data Engineer with around 2 years of experience in Python, SQL, and PySpark, building and optimising scalable ETL pipelines and cloud analytics solutions on AWS and Azure. Currently pursuing an MSc in Data Science at Arden University, Germany. Passionate about automation, cloud-native architectures, and delivering reliable data infrastructure that engineering and analytics teams can depend on.

MSc Data Science

Arden University, Germany · Feb 2025 – Jun 2026

Around 2 years of professional Data Engineering experience
Expert in Python, PySpark & SQL
AWS & Azure cloud infrastructure
Based in Germany — open to remote & hybrid roles

95%

Automation improvement

100K+

Records processed

35%

Performance optimisation

Tech Stack

Skills & Technologies

Programming

Python

SQL

Big Data

Apache Spark

PySpark

Kafka

Hadoop

Cloud

AWS

Azure

Azure Databricks

Microsoft Fabric

Databases

PostgreSQL

MongoDB

MySQL

Snowflake

Tools

Apache Airflow

Azure Data Factory

Docker

Git

Visualisation

Power BI

Communication

Languages

English

Professional Working

90%

German

Upper Intermediate

65%

Career

Work Experience

Junior Data Engineer

Upconnect Labs LLP

Jun 2024 – May 2025Bangalore, India

Designed and maintained scalable data pipelines using Python, SQL, and PySpark to support analytics and reporting needs.
Optimized SQL-based analytical queries by improving join logic and indexing, reducing execution time by 25–35%.
Stabilized and improved production-grade PySpark pipelines in Azure Databricks, resolving failures and ensuring consistent data processing.
Collaborated with analysts and stakeholders to translate reporting requirements into Power BI dashboards, reducing manual reporting workload by 40% and improving KPI visibility.
Implemented data transformation and cleansing logic to improve dataset consistency and ensure analytics-ready outputs.

Portfolio

Featured Projects

End-to-end data pipelines built on AWS and Azure showcasing scalable ETL and cloud analytics.

CarePlus AWS ETL Pipeline

Event-driven healthcare analytics pipeline on AWS that automates end-to-end data processing from raw ticket data through S3, Lambda, Glue, and Redshift, with query performance optimised via Parquet conversion.

PythonAWS S3AWS LambdaAWS GlueAWS RedshiftAWS Athena

Improved data quality by 35% via Glue ETL cleansing

Reduced ETL runtime by 50% with S3 event triggers

Lowered storage costs via Parquet conversion

Olist E-commerce Azure Pipeline

End-to-end Azure data pipeline ingesting 100K+ records from GitHub, MySQL, and MongoDB into ADLS Gen2 via Azure Data Factory, transforming to curated Silver datasets in Databricks, and delivering analytics in Azure Synapse.

Azure Data FactoryAzure DatabricksPySparkADLS Gen2Azure Synapse AnalyticsPythonSQLMongoDB

100K+ records ingested daily from 3 sources

Reduced manual integration by 95%

Enhanced data accuracy by 30% via PySpark deduplication

NYC Taxi Azure & Databricks Pipeline

Cloud-native pipeline extracting 10K+ monthly NYC Taxi records via REST API into ADLS Gen2, transformed through a Bronze–Silver–Gold Databricks lakehouse using PySpark and Delta Lake for downstream analytics.

Azure Data FactoryADLS Gen2Azure DatabricksPySparkDelta LakeSQLPythonREST APIs

10K+ records ingested monthly via API

Reduced manual effort by 90%

Delta Lake ensures data readiness for reporting

Credentials

Certifications

Big Data Engineering with Spark & Hadoop

Udemy

2025

SQL for Data Analysis

CodeBasics

2025

100 Days of Code: Python Bootcamp

Udemy / Angela Yu

2025

Get In Touch

Let's Connect

Open to Data Engineering roles, freelance projects, and collaborations. Drop a message!

Contact Information

neerajde2000@gmail.com

Phone

+49 1774623827

GitHub

Location

Germany