About
Work
ComIT Solution
|Data Engineer
Highlights
Architected a CDP based on Medallion Architecture(Bronze, Silver, Gold layers) using AWS Glue, S3, and Lake Formation.
Ingested raw data (Bronze) from diverse sources including Kafka, Mongo, Postgres etc into S3 and applied cleansing and transformation at the Silver layer using AWS Glue, Spark Jobs and Athena.
Curated Gold layer for BI consumption using Amazon Redshift and PowerBI dashboards.
ComIT Solution
|Data Engineer
Highlights
Azure Data Pipeline Automation: Designed and deployed a scalable ETL pipeline in Azure Data Factory (ADF) to ingest and process raw data from multiple sources into Azure Data Lake Storage (ADLS), reducing data ingestion time by 30%.
Data Lake Management and Optimization: Built a hierarchical data storage structure in Azure Data Lake Storage (ADLS) Gen2, leveraging partitioning and lifecycle policies to manage 1TB of structured and unstructured data efficiently.
Azure Synapse Analytics for Reporting: Developed an end-to-end solution to integrate Synapse Analytics with ADF for transforming raw data into analytical models, enabling real-time reporting in Power BI.
Batch and Streaming Processing with Databricks: Implemented a hybrid processing pipeline using Azure Databricks to handle both batch and streaming data, improving data pipeline processing time by 40%.
Data Quality Monitoring and Alerts: Designed an automated data validation framework in Databricks to detect anomalies and ensure consistency across 50+ datasets. Alerts were integrated with Azure Monitor for proactive monitoring.
Physicswallah(PW)
|Data Engineer
Highlights
Implemented comprehensive Cost Optimization strategies for Redshift, resulting a significant 25% Cost Reduction.
Utilized User Segmentation across platforms to pinpoint cost contributors.
Enhanced Query Performance through measures like Sort and Dist key implementation, terminating Long-Running queries, Column Compression, Analyze Vacuum and Manual WorkLoad Management(WLM) setup with query priorities and priority groups.
Leveraged DBT to reduce down the load from Hevo models and workflows, optimizing queries along with Incremental loading strategies and migrating models and entire workflows from Hevo to DBТ.
Implemented OpenMetadata as a robust Data Catalog tool, utilizing its extensive APIs to enrich databases with detailed descriptions for Tables and Columns, effectively managing Data assets and their ownerships.
Developed a Python script to apply DDM across database tables, by identifying the unmasked columns using Regex and further masking 1400+ columns, ensuring Data Privacy and Security. Furthermore, automating DDM on newly-created tables.
Managed Hevo ETL pipelines and wrote various Python scripts to customize ETL processes according to business requirements. Proactively addressed production incidents, ensuring uninterrupted operations.
Education
SRM University,Chennai
B.Tech
CSE
Grade: 9.1 CGPA
Awards
Star Performer of the Month (x2)[PW]
for exceptional performance during those months
Skills
Languages
Python, SQL, Pyspark.
Tools
Airflow, DBT, Snowflake, Hevo, OpenMetadata, Kafka, Metabase, Spark, Airbyte, PowerBI, Hadoop, Hive.
AWS Services
Redshift, S3, RDS, Glue, Lambda, EMR, EC2, EKS, Aurora, Athena, Resdshift Spectrum, Kinesis.
Databases
Postgresql, MongoDB, Mysql, Cassandra, ChromaDb(Vector Database).
Version Control
Git, GitHub, GitLab, CI/CD.
GenAI Frameworks
Streamlit (Advanced), PandasAI, HuggingFace, Ollama, Langchain, Pytesseract, PyMuPDF.
Devops(Basics)
Docker, Kubernetes, Helm Chart.
Azure Services
Synapse Analytics, Azure Monitor, ADF, Databricks, ADLS Gen2, Azure Functions, Blob Storage.
Projects
Bank-Management-System
Summary
This simple console based Bank Management system provides the simplest management of Bank account and Transactions. It mainly focuses on CRUD properties.