Reputation: 57411
I'm trying to plot this dataset of COVID-19 deaths as a time series of the number of deaths per country. So far, I've tried this script:
import requests
import pandas as pd
import matplotlib.pyplot as plt
def getdata():
response = requests.get("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv")
with open('data.csv', 'wb') as fp:
fp.write(response.content)
if __name__ == "__main__":
getdata()
df = pd.read_csv('data.csv')
dfg = df.groupby(by='Country/Region').sum()
dfg.drop(labels=['Lat', 'Long'], axis=1, inplace=True)
dfg.columns = pd.to_datetime(dfg.columns)
dfplot = dfg.plot()
plt.show()
which produces a data frame like this:
2020-01-22 2020-01-23 2020-01-24 ... 2020-03-25 2020-03-26 2020-03-27
Country/Region ...
Afghanistan 0 0 0 ... 2 4 4
Albania 0 0 0 ... 5 6 8
Algeria 0 0 0 ... 21 25 26
Andorra 0 0 0 ... 1 3 3
Angola 0 0 0 ... 0 0 0
... ... ... ... ... ... ... ...
Venezuela 0 0 0 ... 0 0 1
Vietnam 0 0 0 ... 0 0 0
West Bank and Gaza 0 0 0 ... 0 1 1
Zambia 0 0 0 ... 0 0 0
Zimbabwe 0 0 0 ... 1 1 1
However, the resulting plot does not show a time series, but rather has different countries on the X-axis:
I've tried reading the DataFrame.plot
documentation to see how I could alter this behavior but it's pretty terse. Any ideas how I might accomplish this?
Upvotes: 2
Views: 1388
Reputation: 101
sorry if this is not the right place to ask, I'm new here. How would you plot the same curves but not as a function of the date but as a function of the number of days since 10th (or any other number) death? So the first day with 10 deaths or more becomes day 1?
Upvotes: 0
Reputation: 57411
Following wwii's comment, another solution is to simply plot the transpose of the DataFrame
, dfg.T
.
If I add selecting only the countries with the most deaths at the latest date (i.e. by the values of the last column), I arrive at the following script,
import requests
import pandas as pd
import matplotlib.pyplot as plt
def getdata():
response = requests.get("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv")
with open('data.csv', 'wb') as fp:
fp.write(response.content)
if __name__ == "__main__":
getdata()
df = pd.read_csv('data.csv')
dfg = df.groupby(by='Country/Region').sum()
dfg.sort_values(by=dfg.columns[-1], ascending=False, inplace=True)
dfg.drop(labels=['Lat', 'Long'], axis=1, inplace=True)
dfg.columns = pd.to_datetime(dfg.columns)
dfplot = dfg.iloc[:10].T.plot()
plt.show()
which produces the same plot as shown in the accepted answer:
Upvotes: 1
Reputation: 107567
To achieve a time series plot in pandas, your index should be datetime not as columns. And because their original data arrived with dates as columns some data reshaping is needed:
melt
to reshape the original data from wide to long format with Date as a column;pivot_table
to aggregate and reshape to wide for country as columns with Date as index.Then, call DataFrame.plot
as intended:
df_deaths = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/"
"csse_covid_19_time_series/time_series_covid19_deaths_global.csv")
# MELT WIDE DATA TO LONG
df_deaths = (df_deaths.melt(id_vars = ['Province/State', 'Country/Region', 'Lat', 'Long'],
var_name = 'Date', value_name = 'Deaths')
.assign(Date = lambda x: pd.to_datetime(x['Date'])))
# PIVOT AGGREGATION TO GENERATE DATE INDEX BY COUNTRY COLUMNS
df_pvt = df_deaths.pivot_table(index='Date', columns='Country/Region',
values='Deaths', aggfunc='sum')
df_pvt.plot(kind='line')
plt.show()
And because above is such an overwhelming plot with nearly all countries of world, consider slicing only a handful of countries like top 10 affected and integrate matplotlib Axes
objects for better control of output:
top_countries = (df_deaths.groupby('Country/Region')['Deaths'].sum()
.sort_values(ascending=False))
fig, ax = plt.subplots(figsize=(15,6))
(df_pvt.reindex(top_countries.index.values[:10], axis = 'columns')
.plot(kind='line', ax = ax))
plt.show()
Upvotes: 4