Reputation: 1691
I have a rolling sum calculated on a grouped data frame but its adding up the wrong way, it is a sum of the future, when I need a sum of the past.
What am I doing wrong here?
I import the data and sort by Dimension and Date (I have tried removing the date sort already)
df = pd.read_csv('Input.csv', parse_dates=True)
df.sort_values(['Dimension','Date'])
print(df)
I then create a new column which is a multi index grouped by rolling window
new_column = df.groupby('Dimension').Value1.apply(lambda x:
x.rolling(window=3).sum())
I then reset the index to be the same as the original
df['Sum_Value1'] = new_column.reset_index(level=0, drop=True)
print(df)
I have also tried reversing the index before the calculation, but that also failed.
Input
Dimension,Date,Value1,Value2
1,4/30/2002,10,20
1,1/31/2002,10,20
1,10/31/2001,10,20
1,7/31/2001,10,20
1,4/30/2001,10,20
1,1/31/2001,10,20
1,10/31/2000,10,20
2,4/30/2002,10,20
2,1/31/2002,10,20
2,10/31/2001,10,20
2,7/31/2001,10,20
2,4/30/2001,10,20
2,1/31/2001,10,20
2,10/31/2000,10,20
3,4/30/2002,10,20
3,1/31/2002,10,20
3,10/31/2001,10,20
3,7/31/2001,10,20
3,1/31/2001,10,20
3,10/31/2000,10,20
Output:
Dimension Date Value1 Value2 Sum_Value1
0 1 4/30/2002 10 20 NaN
1 1 1/31/2002 10 20 NaN
2 1 10/31/2001 10 20 30.0
3 1 7/31/2001 10 20 30.0
4 1 4/30/2001 10 20 30.0
5 1 1/31/2001 10 20 30.0
6 1 10/31/2000 10 20 30.0
7 2 4/30/2002 10 20 NaN
8 2 1/31/2002 10 20 NaN
9 2 10/31/2001 10 20 30.0
10 2 7/31/2001 10 20 30.0
11 2 4/30/2001 10 20 30.0
12 2 1/31/2001 10 20 30.0
13 2 10/31/2000 10 20 30.0
Goal Output:
Dimension Date Value1 Value2 Sum_Value1
0 1 4/30/2002 10 20 30.0
1 1 1/31/2002 10 20 30.0
2 1 10/31/2001 10 20 30.0
3 1 7/31/2001 10 20 30.0
4 1 4/30/2001 10 20 30.0
5 1 1/31/2001 10 20 NaN
6 1 10/31/2000 10 20 NaN
7 2 4/30/2002 10 20 30.0
8 2 1/31/2002 10 20 30.0
9 2 10/31/2001 10 20 30.0
10 2 7/31/2001 10 20 30.0
11 2 4/30/2001 10 20 30.0
12 2 1/31/2001 10 20 Nan
13 2 10/31/2000 10 20 NaN
Upvotes: 13
Views: 19357
Reputation: 8986
Just had to do the same thing myself and came up with a simple one-liner:
df['Sum_Value1'] = df['Value1'].iloc[::-1].rolling(window = 3).sum()
Upvotes: 0
Reputation: 29
You can use
import pandas as pd
from pandas.api.indexers import FixedForwardWindowIndexer
df = pd.read_csv(r'C:\Users\xxxx\python\data.txt')
indexer = FixedForwardWindowIndexer(window_size=3)
df1 = df.join(df.groupby('Dimension')['Value1'].rolling(indexer, min_periods=3).sum().to_frame().reset_index(), rsuffix='_sum')
del df1['Dimension_sum']
del df1['level_1']
df1
Input:
Dimension Date Value1 Value2
0 1 4/30/2002 10 20
1 1 1/31/2002 10 20
2 1 10/31/2001 10 20
3 1 7/31/2001 10 20
4 1 4/30/2001 10 20
5 1 1/31/2001 10 20
6 1 10/31/2000 10 20
7 2 4/30/2002 10 20
8 2 1/31/2002 10 20
9 2 10/31/2001 10 20
10 2 7/31/2001 10 20
11 2 4/30/2001 10 20
12 2 1/31/2001 10 20
13 2 10/31/2000 10 20
14 3 4/30/2002 10 20
15 3 1/31/2002 10 20
16 3 10/31/2001 10 20
17 3 7/31/2001 10 20
18 3 1/31/2001 10 20
19 3 10/31/2000 10 20
OUTPUT:
Dimension Date Value1 Value2 Value1_sum
0 1 4/30/2002 10 20 30.0
1 1 1/31/2002 10 20 30.0
2 1 10/31/2001 10 20 30.0
3 1 7/31/2001 10 20 30.0
4 1 4/30/2001 10 20 30.0
5 1 1/31/2001 10 20 NaN
6 1 10/31/2000 10 20 NaN
7 2 4/30/2002 10 20 30.0
8 2 1/31/2002 10 20 30.0
9 2 10/31/2001 10 20 30.0
10 2 7/31/2001 10 20 30.0
11 2 4/30/2001 10 20 30.0
12 2 1/31/2001 10 20 NaN
13 2 10/31/2000 10 20 NaN
14 3 4/30/2002 10 20 30.0
15 3 1/31/2002 10 20 30.0
16 3 10/31/2001 10 20 30.0
17 3 7/31/2001 10 20 30.0
18 3 1/31/2001 10 20 NaN
19 3 10/31/2000 10 20 NaN
Upvotes: 2
Reputation: 625
def reverse_rolling(series, window, func):
index = series.index
series = pd.DataFrame(series.iloc[::-1])
series = series.rolling(window, 1).apply(func)
series = series.iloc[::-1]
series['index'] = index
series = series.set_index('index')
return series[0]
Upvotes: 0
Reputation: 61
Rolling backwards is the same as rolling forward and then shifting the result:
x.rolling(window=3).sum().shift(-2)
Upvotes: 6
Reputation: 32105
You need a backward sum, therefore reverse your series before sum rolling it:
lambda x: x[::-1].rolling(window=3).sum()
Upvotes: 11
Reputation: 215037
You can shift the result by window-1
to get the left aligned results:
df["sum_value1"] = (df.groupby('Dimension').Value1
.apply(lambda x: x.rolling(window=3).sum().shift(-2)))
Upvotes: 6