Reputation: 529
I want to groupby two columns, 'Y' and 'A', then shift().rolling()
for column 'ValueA'.
I tried this code but result is not correct.
Code
df = pd.DataFrame({
'Y' : [0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1],
'A' : ['b','c','a','c','a','c','b','c','a', 'a', 'b', 'b','c','a','a','b'],
'B': ['a', 'a', 'b', 'b','c','a','a','b','b','c','a','c','a','c','b','c'],
'ValueA':[1,2,2,1,2,4,7,1,3,2,4,3,1,2,4,5],
'ValueB':[3,2,4,3,1,2,4,5,1,2,2,1,2,4,7,1]
})
df['ValueX'] = df.groupby(['Y','A'])['ValueA'].shift().rolling(3, min_periods=3).sum()
Output for 'A' == a
Y A B ValueA ValueB ValueX
2 0 a b 2 4 NaN
4 1 a c 2 1 NaN
8 1 a b 3 1 NaN
9 1 a c 2 2 9.0
13 1 a c 2 4 7.0
14 1 a b 4 7 5.0
Expected Output
Y A B ValueA ValueB ValueX
2 0 a b 2 4 NaN
4 1 a c 2 1 NaN
8 1 a b 3 1 NaN
9 1 a c 2 2 NaN
13 1 a c 2 4 7.0
14 1 a b 4 7 7.0
Upvotes: 1
Views: 711
Reputation: 71687
We need to to perform both shift
and rolling
operation per group, but instead you are performing the shift
operation per group then rolling operation for the entire column which is producing the incorrect output.
df['ValueX'] = df.groupby(['Y', 'A'])['ValueA']\
.apply(lambda v: v.shift().rolling(3).sum())
print(df)
Y A B ValueA ValueB ValueX
0 0 b a 1 3 NaN
1 0 c a 2 2 NaN
2 0 a b 2 4 NaN
3 1 c b 1 3 NaN
4 1 a c 2 1 NaN
5 1 c a 4 2 NaN
6 1 b a 7 4 NaN
7 1 c b 1 5 NaN
8 1 a b 3 1 NaN
9 1 a c 2 2 NaN
10 1 b a 4 2 NaN
11 1 b c 3 1 NaN
12 1 c a 1 2 6.0
13 1 a c 2 4 7.0
14 1 a b 4 7 7.0
15 1 b c 5 1 14.0
As a side note, you don't have to explicitly specify the min_periods
optional argument, it will default to the window size if not specified.
Upvotes: 2