Negin Zarbakhsh
Negin Zarbakhsh

Reputation: 177

substracting two rows in one column based on other columns values in python dataframe

I am new to python and I am trying to work with a covid dataset. Below shows the tail of my COVID DF.

I need a new column (covid ['Daily_Confirmed']) that subtract the 'Confirmed' Columns in each row since this feature shows the aggregated data.

Two rows in the Confirmed column should be deducted if 'region', 'Population', and 'date' columns are the same.

This way we will have a number of daily confirmed cases for each region in the same DF.

region Population date Confirmed
10889 Tipperary 159553 2021-04-22 5719
10890 Waterford 116176 2021-04-22 5542
10891 Westmeath 116176 2021-04-10 3780
for i in range (1 ,len(covid)):
        if (covid['region'][i] == covid['region'][i-1]) and (covid['Population'][i] == covid['Population'][i-1]) and (covid['Population'][i] == covid['date'][i] == covid['date'][i-1]:
            covid ['Daily_Confirmed'] = covid['Confirmed'].loc[i] - covid['Confirmed'].loc[i-1]
covid.head()

Upvotes: 0

Views: 65

Answers (1)

Shubham Periwal
Shubham Periwal

Reputation: 2248

Minor edit to @Scott Boston's answer

df.sort_values('date').groupby(['region', 'Population'])['Confirmed'].diff()

Upvotes: 2

Related Questions