Reputation: 177
I am new to python and I am trying to work with a covid dataset. Below shows the tail of my COVID DF.
I need a new column (covid ['Daily_Confirmed']) that subtract the 'Confirmed' Columns in each row since this feature shows the aggregated data.
Two rows in the Confirmed column should be deducted if 'region', 'Population', and 'date' columns are the same.
This way we will have a number of daily confirmed cases for each region in the same DF.
region | Population | date | Confirmed | |
---|---|---|---|---|
10889 | Tipperary | 159553 | 2021-04-22 | 5719 |
10890 | Waterford | 116176 | 2021-04-22 | 5542 |
10891 | Westmeath | 116176 | 2021-04-10 | 3780 |
for i in range (1 ,len(covid)):
if (covid['region'][i] == covid['region'][i-1]) and (covid['Population'][i] == covid['Population'][i-1]) and (covid['Population'][i] == covid['date'][i] == covid['date'][i-1]:
covid ['Daily_Confirmed'] = covid['Confirmed'].loc[i] - covid['Confirmed'].loc[i-1]
covid.head()
Upvotes: 0
Views: 65
Reputation: 2248
Minor edit to @Scott Boston's answer
df.sort_values('date').groupby(['region', 'Population'])['Confirmed'].diff()
Upvotes: 2