How to calculate running sum based on ID and date

Question

I have a dataset in which I have the following columns: date, id, value. I then want a running sum of the preceding 3 days (including the current day itself) for every id.

I have tried to look at several similar questions at StackOverflow, but none of them gives me the right result.

If the id has been seen only once within the last 3 days, the sum should be equal to that single value. The same goes if it has been seen two times within the last 3 days, obviously.

The desired output is the column "sum_3days":

date    id  value   sum_3days  
01/01/2019  1   2   2    
01/01/2019  2   3   3    
02/01/2019  1   2   4    
02/01/2019  2   5   8   
03/01/2019  1   2   6   
03/01/2019  2   1   9    
04/01/2019  1   6   10    
05/01/2019  1   3   11
06/01/2019  1   6   15
06/01/2019  2   8   8
07/01/2019  1   3   12    
07/01/2019  2   2   10

So basically, the sum should "give me the sum of all the values every id has had within the last 3 days"

Chris Adams · Accepted Answer

Use groupby, transform and a lambda with rolling and sum:

df['sum_3days'] = (df.groupby(['id'])['value']
                   .transform(lambda x: x.rolling(3, min_periods=1).sum()))

[output]

         date  id  value  sum_3days
0  2019-01-01   1      2          2
1  2019-01-01   2      3          3
2  2019-02-01   1      2          4
3  2019-02-01   2      5          8
4  2019-03-01   1      2          6
5  2019-03-01   2      1          9
6  2019-04-01   1      6         10
7  2019-05-01   1      3         11
8  2019-06-01   1      6         15
9  2019-06-01   2      8         14
10 2019-07-01   1      3         12
11 2019-07-01   2      2         11

How to calculate running sum based on ID and date

Answers (2)

Related Questions