Hello! I'm Neel, an experienced Data Analyst passionate about extracting insights from data.
Lifelong learner, skilled in math & programming, passionate about data analysis, enjoys tackling complex challenges & creating visualizations.
View My LinkedIn Profile
This project demonstrates how Azure Data Factory (ADF) is used to orchestrate data ingestion, transformation, and publishing for COVID-19 reporting, powered by Azure services such as Blob Storage, Data Lake Storage Gen2, and Azure SQL Database.
The solution ingests data from the ECDC website and population datasets stored in Azure Blob. These are processed and stored in Azure Data Lake Gen2. ADF Data Flows handle transformation logic, with final output written to Azure SQL Database for reporting.
There are dedicated pipelines for ingesting population data and ECDC COVID-19 statistics. The ECDC ingestion uses a Lookup
and ForEach
pattern to dynamically pull multiple files from HTTP endpoints and load them into Data Lake.
Two data flows are used:
Cases & Deaths Data Flow filters for Europe, selects required columns, performs a pivot, enriches data with country codes from a lookup file, and loads into a processed dataset.
Hospital Admissions Data Flow enriches and splits data into weekly and daily summaries using aggregation, sorting, pivoting, and sink transformations.
Both data flows use a country lookup dataset pointing to a CSV file in the Azure Data Lake.
These pipelines trigger the respective data flows, running them on AutoResolveIntegrationRuntime with configurable compute sizes and verbose logging.
Two pipelines prepare data for export to SQLite, using wildcard datasets to collect all processed files and make them ready to import into SQL database.
Monitoring of pipeline executions is shown using ADF Studioโs runtime views, with status on each activity including lookup, copy, and ForEach executions.
The project uses four key linked services that connect ADF to various data sources and sinks:
ls_ablob_testcovid19sa
) for source population data.ls_adl_testcovid19dl
) for raw and processed storage.ls_http_opendata_ecdc_europe_eu
) to ingest ECDC COVID-19 data.ls_testcovid19db
) as the final reporting database.