Reputation: 109
I have a dataset in which I have the following columns: date, id, value. I then want a running sum of the preceding 3 days (including the current day itself) for every id.
I have tried to look at several similar questions at StackOverflow, but none of them gives me the right result.
If the id has been seen only once within the last 3 days, the sum should be equal to that single value. The same goes if it has been seen two times within the last 3 days, obviously.
The desired output is the column "sum_3days":
date id value sum_3days
01/01/2019 1 2 2
01/01/2019 2 3 3
02/01/2019 1 2 4
02/01/2019 2 5 8
03/01/2019 1 2 6
03/01/2019 2 1 9
04/01/2019 1 6 10
05/01/2019 1 3 11
06/01/2019 1 6 15
06/01/2019 2 8 8
07/01/2019 1 3 12
07/01/2019 2 2 10
So basically, the sum should "give me the sum of all the values every id has had within the last 3 days"
Upvotes: 1
Views: 371
Reputation: 18647
Use groupby
, transform
and a lambda
with rolling
and sum
:
df['sum_3days'] = (df.groupby(['id'])['value']
.transform(lambda x: x.rolling(3, min_periods=1).sum()))
[output]
date id value sum_3days
0 2019-01-01 1 2 2
1 2019-01-01 2 3 3
2 2019-02-01 1 2 4
3 2019-02-01 2 5 8
4 2019-03-01 1 2 6
5 2019-03-01 2 1 9
6 2019-04-01 1 6 10
7 2019-05-01 1 3 11
8 2019-06-01 1 6 15
9 2019-06-01 2 8 14
10 2019-07-01 1 3 12
11 2019-07-01 2 2 11
Upvotes: 2
Reputation: 2613
Have you tried the function
Cumsum()
This webpage may be of help http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.cumsum.html
Upvotes: 0