Reputation: 1125
I have a dataframe of Covid-19 deaths by country. Countries are identified in the Country
column. Sub-national classification is based on the Province
column.
I want to generate a dataframe which sums all columns based on the value in the Country
column (except the first 2, which are geographical data). In short, for each date, I want to compress the observations for all provinces of a country such that I get a single number for each country.
Right now, I am able to do that for a single date:
import pandas as pd
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-
19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
raw = pd.read_csv(url)
del raw['Lat']
del raw['Long']
raw.rename({'Country/Region': 'Country', 'Province/State': 'Province'}, axis=1, inplace=True)
raw2 = raw.groupby('Country')['6/29/20'].sum()
How can I achieve this for all dates?
Upvotes: 0
Views: 169
Reputation: 150735
You can use iloc
:
raw2 = raw.iloc[:,4:].groupby(raw.Country).sum()
Upvotes: 1