Neel Shah

Logo

Hello! I'm Neel, an experienced Data Analyst passionate about extracting insights from data.

Lifelong learner, skilled in math & programming, passionate about data analysis, enjoys tackling complex challenges & creating visualizations.

View My LinkedIn Profile

View My GitHub Profile

COVID-19 Reporting Project using Azure Data Factory

This project demonstrates how Azure Data Factory (ADF) is used to orchestrate data ingestion, transformation, and publishing for COVID-19 reporting, powered by Azure services such as Blob Storage, Data Lake Storage Gen2, and Azure SQL Database.

๐Ÿงฉ Architecture Overview

The solution ingests data from the ECDC website and population datasets stored in Azure Blob. These are processed and stored in Azure Data Lake Gen2. ADF Data Flows handle transformation logic, with final output written to Azure SQL Database for reporting.

๐Ÿš€ Ingestion Pipelines

There are dedicated pipelines for ingesting population data and ECDC COVID-19 statistics. The ECDC ingestion uses a Lookup and ForEach pattern to dynamically pull multiple files from HTTP endpoints and load them into Data Lake.

Ingestion Flow


Ingestion Validation

๐Ÿ›  Data Transformation

Two data flows are used:

Cases & Deaths Data Flow filters for Europe, selects required columns, performs a pivot, enriches data with country codes from a lookup file, and loads into a processed dataset.

Transform Cases Data Flow

Hospital Admissions Data Flow enriches and splits data into weekly and daily summaries using aggregation, sorting, pivoting, and sink transformations.

Transform Hospital Admissions Flow

Both data flows use a country lookup dataset pointing to a CSV file in the Azure Data Lake.

Country Lookup Dataset

๐Ÿงช Processing Pipelines

These pipelines trigger the respective data flows, running them on AutoResolveIntegrationRuntime with configurable compute sizes and verbose logging.

Processing Flow

๐Ÿ—ƒ SQL Export

Two pipelines prepare data for export to SQLite, using wildcard datasets to collect all processed files and make them ready to import into SQL database.

SQLite Case Export

โœ… Monitoring

Monitoring of pipeline executions is shown using ADF Studioโ€™s runtime views, with status on each activity including lookup, copy, and ForEach executions.

Pipeline Monitoring

๐Ÿ“ฆ Data Sources

๐Ÿ”Œ Linked Services

The project uses four key linked services that connect ADF to various data sources and sinks:

Linked Services