Reputation: 9
I got data describing the number of newly hospitalized persons for specific days and regions. The number of hospitalized persons is the rolling sum of new hospitalized persons for the last 7 days. The DataFrame looks like this:
Date Region sum_of_last_7_days
01.01.2020 1 1
02.01.2020 1 2
03.01.2020 1 3
04.01.2020 1 4
05.01.2020 1 5
06.01.2020 1 6
07.01.2020 1 7
08.01.2020 1 7
09.01.2020 1 7
01.01.2020 2 1
02.01.2020 2 2
03.01.2020 2 3
04.01.2020 2 4
05.01.2020 2 5
06.01.2020 2 6
07.01.2020 2 7
08.01.2020 2 7
09.01.2020 2 7
10.01.2020 2 4
The goal output is:
Date Region daily_new
01.01.2020 1 1
02.01.2020 1 1
03.01.2020 1 1
04.01.2020 1 1
05.01.2020 1 1
06.01.2020 1 1
07.01.2020 1 1
08.01.2020 1 0
09.01.2020 1 0
01.01.2020 2 1
02.01.2020 2 1
03.01.2020 2 1
04.01.2020 2 1
05.01.2020 2 1
06.01.2020 2 1
07.01.2020 2 1
08.01.2020 2 0
09.01.2020 2 0
10.01.2020 2 0
The way should be via undo the rolling sum operation with a window for 7 days, but I wasn't able to find any solution.
Upvotes: 0
Views: 208
Reputation: 262484
To get the original, perform a diff
and fill with the first value:
s = df.groupby('Region')['sum_of_last_7_days'].diff()
df['original'] = s.mask(s.isna(), df['sum_of_last_7_days'])
output:
Date Region sum_of_last_7_days original
0 01.01.2020 1 1 1.0
1 02.01.2020 1 2 1.0
2 03.01.2020 1 3 1.0
3 04.01.2020 1 4 1.0
4 05.01.2020 1 5 1.0
5 06.01.2020 1 6 1.0
6 07.01.2020 1 7 1.0
7 08.01.2020 1 7 0.0
8 09.01.2020 1 7 0.0
9 01.01.2020 2 1 1.0
10 02.01.2020 2 2 1.0
11 03.01.2020 2 3 1.0
12 04.01.2020 2 4 1.0
13 05.01.2020 2 5 1.0
14 06.01.2020 2 6 1.0
15 07.01.2020 2 7 1.0
16 08.01.2020 2 7 0.0
17 09.01.2020 2 7 0.0
Upvotes: 2