saladin1991
saladin1991

Reputation: 152

How to inverse cumulative sum in Pandas

I have a rather simple problem to which I cannot seem to find any solution. I have looked at many similar questions online but their solutions don't work in my case.

I have a pandas dataset in the following format

        CountryCode  ConfirmedCases  ConfirmedDeaths        date
   335         ABW          4872.0             45.0  2020-12-01
   336         ABW          4902.0             45.0  2020-12-02
   337         ABW          4923.0             45.0  2020-12-03
   338         ABW          4942.0             45.0  2020-12-04
   339         ABW          4957.0             45.0  2020-12-05

ConfirmedCases and ConfirmedDeaths are both measured as cumulative sums by CountryCode, but I need the daily cases and daily deaths by CountryCode. This requires that I compute the difference between confirmed cases at time t and time t-1 for every country in my dataset--i.e. the difference between the number of cases today and the day before in the same country. Ideally, I would like a solution that allows me to register the daily number of cases and daily number of deaths by country as two new variables in the dataset. As a result, I would like something that should look like the following:

        CountryCode  ConfirmedCases  ConfirmedDeaths        date    DailyCases    Daily Deaths
   335         ABW          4872.0             45.0  2020-12-01     15            0
   336         ABW          4902.0             45.0  2020-12-02     30            0
   337         ABW          4923.0             45.0  2020-12-03     21            0
   338         ABW          4942.0             45.0  2020-12-04     19            0
   339         ABW          4957.0             45.0  2020-12-05     15            0

Thanks in advance for your help!

Upvotes: 1

Views: 390

Answers (1)

rudolfovic
rudolfovic

Reputation: 3276

Quang Hoang's solution is right except you should then replace NaN values with the initial value for each country:

df['DailyCases'] = df.groupby('CountryCode')['ConfirmedCases'].diff()
df['DailyCases'] = df['DailyCases'].fillna(df['ConfirmedCases'])

Upvotes: 1

Related Questions