bluesummers
bluesummers

Reputation: 12647

Summing values across given range of days difference backwards - Pandas

I have created a days difference column in a pandas dataframe, and I'm looking to add a column that has the sum of a specific value over a given days window backwards

Notice that I can supply a date column for each row if it is needed, but the diff was created as days difference from the first day of the data.

Example

df = pd.DataFrame.from_dict({'diff': [0,0,1,2,2,2,2,10,11,15,18],
                            'value': [10,11,15,2,5,7,8,9,23,14,15]})
df
Out[12]: 
    diff  value
0      0     10
1      0     11
2      1     15
3      2      2
4      2      5
5      2      7
6      2      8
7     10      9
8     11     23
9     15     14
10    18     15

I want to add 5_days_back_sum column that will sum the past 5 days, including same day so the result would be like this

Out[15]: 
    5_days_back_sum  diff  value
0                21     0     10
1                21     0     11
2                36     1     15
3                58     2      2
4                58     2      5
5                58     2      7
6                58     2      8
7                 9    10      9
8                32    11     23
9                46    15     14
10               29    18     15

How can I achieve that? Originally I have a date column to create the diff column, if that helps its available

Upvotes: 1

Views: 79

Answers (1)

jezrael
jezrael

Reputation: 863301

Use custom function with boolean indexing for filtering range with sum:

def f(x):
    return df.loc[(df['diff'] >= x - 5) & (df['diff'] <= x), 'value'].sum()

df['5_days_back_sum'] = df['diff'].apply(f)
print (df)
    diff  value  5_days_back_sum
0      0     10               21
1      0     11               21
2      1     15               36
3      2      2               58
4      2      5               58
5      2      7               58
6      2      8               58
7     10      9                9
8     11     23               32
9     15     14               46
10    18     15               29

Similar solution with between:

def f(x):
    return df.loc[df['diff'].between(x - 5, x), 'value'].sum()

df['5_days_back_sum'] = df['diff'].apply(f)
print (df)
    diff  value  5_days_back_sum
0      0     10               21
1      0     11               21
2      1     15               36
3      2      2               58
4      2      5               58
5      2      7               58
6      2      8               58
7     10      9                9
8     11     23               32
9     15     14               46
10    18     15               29

Upvotes: 3

Related Questions