Covid-19 Story

Overview

We imported a CSV of open data on Covid-19 pandemic into a MySQL database and ran multiple queries to extract descriptive statics from it. We practiced creating temporary tables, views, window functions (PARTITION BY), and more. We then used MySQL connector in Microsoft Power BI to create a report and publish it on the web.

Objective

Analyzing the data related to the Covid-19 pandemic, including but not limited to vaccinations, casualties, and more.

Context

This is a project that I built as a showcase of Bower BI capabilities.

  • MySQL
  • Microsoft Power BI
  • Visual Studio Code
  • DBeaver
  • Data wrangling
  • Descriptive statistics
  • Visualizations
  • Geospatial analysis in Power BI
  • Data modeling
  • Format: Excel-CSV
  • Records: ~85K
  • Information: Number of cases, vaccinations, hospitalizations, and deaths due to Covid-19
  • Data Citation: Our World in Data
    Data collected till: 30 April 2021

Questions

  1. Which parts of the world have more Covid-19 cases?
  2. How different countries differ in terms of the number of Covid-19 vaccinations?
  3. Which parts of the world have more Covid-19 cases?

Solution

These three questions are solved by analysis of multiple variables in the dataset.

A Power BI dashboard is used to visualize these variables, so that the data can be filtered and analyzed at the country and continent levels.

A slicer is used to provide the ability to focus on an interval of time.

Data model

Power BI Report

Which parts of the world have more Covid-19 cases?

The data shows Europe had most Covid-19 cases in this period (41m cases) followed by North America (35m cases).

How different countries differ in terms of the number of Covid-19 vaccinations?

The dashboard can be used to visualize vaccination counts per country and per continent. The visualization shows that Asia has most vaccinations (more than 194m) during this time. This can be explained by the fact that India and China are among the top three countries with most vaccinations.

Which parts of the world have more Covid-19 cases?

In this period there have been 2.84m deaths reported due to Covid-19 all around the world. Europe with more than 920k deaths has the highest number of deaths.

Challenges

  • The data has many nulls. After investigation it turned out that the dataset has both continent- and country-level data in it, making the sums over multiple columns double-counting values.
  • Data required some validation as there were invalid values (e.g., negative number for new cases)
  • Due to the above, data required multiple cleaning steps before it was usable in the report.

Recommendation

  • More analyses are required to be able to be able to establish meaningful relationships between different variables such as number of fully vaccinated people and population density with the number of admissions to the hospital and deaths.

Deliverables