Adam Grünwald
Adam Grünwald

Reputation: 9

Get original values from rolling sum in Pandas DataFrame

I got data describing the number of newly hospitalized persons for specific days and regions. The number of hospitalized persons is the rolling sum of new hospitalized persons for the last 7 days. The DataFrame looks like this:

Date         Region      sum_of_last_7_days
01.01.2020   1           1
02.01.2020   1           2
03.01.2020   1           3
04.01.2020   1           4
05.01.2020   1           5
06.01.2020   1           6
07.01.2020   1           7
08.01.2020   1           7
09.01.2020   1           7
01.01.2020   2           1
02.01.2020   2           2
03.01.2020   2           3
04.01.2020   2           4
05.01.2020   2           5
06.01.2020   2           6
07.01.2020   2           7
08.01.2020   2           7
09.01.2020   2           7
10.01.2020   2           4

The goal output is:

Date         Region      daily_new
01.01.2020   1           1
02.01.2020   1           1
03.01.2020   1           1
04.01.2020   1           1
05.01.2020   1           1
06.01.2020   1           1
07.01.2020   1           1
08.01.2020   1           0
09.01.2020   1           0
01.01.2020   2           1
02.01.2020   2           1
03.01.2020   2           1
04.01.2020   2           1
05.01.2020   2           1
06.01.2020   2           1
07.01.2020   2           1
08.01.2020   2           0
09.01.2020   2           0
10.01.2020   2           0

The way should be via undo the rolling sum operation with a window for 7 days, but I wasn't able to find any solution.

Upvotes: 0

Views: 208

Answers (1)

mozway
mozway

Reputation: 262484

To get the original, perform a diff and fill with the first value:

s = df.groupby('Region')['sum_of_last_7_days'].diff()
df['original'] = s.mask(s.isna(), df['sum_of_last_7_days'])

output:

          Date  Region  sum_of_last_7_days  original
0   01.01.2020       1                   1       1.0
1   02.01.2020       1                   2       1.0
2   03.01.2020       1                   3       1.0
3   04.01.2020       1                   4       1.0
4   05.01.2020       1                   5       1.0
5   06.01.2020       1                   6       1.0
6   07.01.2020       1                   7       1.0
7   08.01.2020       1                   7       0.0
8   09.01.2020       1                   7       0.0
9   01.01.2020       2                   1       1.0
10  02.01.2020       2                   2       1.0
11  03.01.2020       2                   3       1.0
12  04.01.2020       2                   4       1.0
13  05.01.2020       2                   5       1.0
14  06.01.2020       2                   6       1.0
15  07.01.2020       2                   7       1.0
16  08.01.2020       2                   7       0.0
17  09.01.2020       2                   7       0.0

Upvotes: 2

Related Questions