Reputation: 546
This link contains CSV files for daily reports of COVID-19.
What is the best solution to get all the csv files in a dataframe?
I tried the code bellow from other questions but it doesnt work.
from pathlib import Path
import pandas as pd
files = Path('https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports')
csv_only = files.rglob('*.csv')
combo = [pd.read_csv(f)
.assign(f.stem)
.fillna(0)
for f in csv_only]
one_df = pd.concat(combo,ignore_index=True)
one_df = one_df.drop_duplicates('date')
print(one_df)
How could i fit requests to read all the files?
Upvotes: 0
Views: 877
Reputation: 1441
You can simply use requests
module to get the names of all the .csv
present, which would eliminate the need to run glob
:
import requests
url = "https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports"
csv_only = [i.split("=")[1][1:-1] for i in requests.get(url).text.split(" ") if '.csv' in i and 'title' in i]
Upvotes: 1
Reputation: 5960
pathlib
only works with filesystems so this won't do. csv_only
will be an empty generator since there is no such location on your disk. You need to fetch the data from github with actual http requests. I did something for some personal stuff some time ago, you can have a look and modify it accordingly(uses the github API so you'll need to get one).
Upvotes: 0