COVID-19 data analysis with Python from Github CSV

Question

This link contains CSV files for daily reports of COVID-19.

https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports

What is the best solution to get all the csv files in a dataframe?

I tried the code bellow from other questions but it doesnt work.

from pathlib import Path
import pandas as pd

files = Path('https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports')

csv_only = files.rglob('*.csv')

combo = [pd.read_csv(f)
         .assign(f.stem)
         .fillna(0)
         for f in csv_only]

one_df = pd.concat(combo,ignore_index=True)

one_df = one_df.drop_duplicates('date')
print(one_df)

How could i fit requests to read all the files?

Partha Mandal · Accepted Answer

You can simply use requests module to get the names of all the .csv present, which would eliminate the need to run glob:

import requests
url = "https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports"
csv_only  = [i.split("=")[1][1:-1] for i in requests.get(url).text.split(" ") if '.csv' in i and 'title' in i]

COVID-19 data analysis with Python from Github CSV

Answers (2)

Related Questions