Reputation: 12647
I have created a days difference column in a pandas dataframe, and I'm looking to add a column that has the sum of a specific value over a given days window backwards
Notice that I can supply a date column for each row if it is needed, but the diff was created as days difference from the first day of the data.
Example
df = pd.DataFrame.from_dict({'diff': [0,0,1,2,2,2,2,10,11,15,18],
'value': [10,11,15,2,5,7,8,9,23,14,15]})
df
Out[12]:
diff value
0 0 10
1 0 11
2 1 15
3 2 2
4 2 5
5 2 7
6 2 8
7 10 9
8 11 23
9 15 14
10 18 15
I want to add 5_days_back_sum
column that will sum the past 5 days, including same day so the result would be like this
Out[15]:
5_days_back_sum diff value
0 21 0 10
1 21 0 11
2 36 1 15
3 58 2 2
4 58 2 5
5 58 2 7
6 58 2 8
7 9 10 9
8 32 11 23
9 46 15 14
10 29 18 15
How can I achieve that? Originally I have a date column to create the diff column, if that helps its available
Upvotes: 1
Views: 79
Reputation: 863301
Use custom function with boolean indexing
for filtering range with sum
:
def f(x):
return df.loc[(df['diff'] >= x - 5) & (df['diff'] <= x), 'value'].sum()
df['5_days_back_sum'] = df['diff'].apply(f)
print (df)
diff value 5_days_back_sum
0 0 10 21
1 0 11 21
2 1 15 36
3 2 2 58
4 2 5 58
5 2 7 58
6 2 8 58
7 10 9 9
8 11 23 32
9 15 14 46
10 18 15 29
Similar solution with between
:
def f(x):
return df.loc[df['diff'].between(x - 5, x), 'value'].sum()
df['5_days_back_sum'] = df['diff'].apply(f)
print (df)
diff value 5_days_back_sum
0 0 10 21
1 0 11 21
2 1 15 36
3 2 2 58
4 2 5 58
5 2 7 58
6 2 8 58
7 10 9 9
8 11 23 32
9 15 14 46
10 18 15 29
Upvotes: 3