Groupby two columns pandas dataframe and shift().rolling()

Question

I want to groupby two columns, 'Y' and 'A', then shift().rolling() for column 'ValueA'. I tried this code but result is not correct.

Code

df = pd.DataFrame({
    'Y' : [0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1],
    'A' : ['b','c','a','c','a','c','b','c','a', 'a', 'b', 'b','c','a','a','b'],
    'B': ['a', 'a', 'b', 'b','c','a','a','b','b','c','a','c','a','c','b','c'],
    'ValueA':[1,2,2,1,2,4,7,1,3,2,4,3,1,2,4,5],
    'ValueB':[3,2,4,3,1,2,4,5,1,2,2,1,2,4,7,1]
})
df['ValueX'] = df.groupby(['Y','A'])['ValueA'].shift().rolling(3, min_periods=3).sum()

Output for 'A' == a

    Y   A   B   ValueA  ValueB  ValueX
2   0   a   b   2       4       NaN
4   1   a   c   2       1       NaN
8   1   a   b   3       1       NaN
9   1   a   c   2       2       9.0
13  1   a   c   2       4       7.0
14  1   a   b   4       7       5.0

Expected Output

    Y   A   B   ValueA  ValueB  ValueX
2   0   a   b   2       4       NaN
4   1   a   c   2       1       NaN
8   1   a   b   3       1       NaN
9   1   a   c   2       2       NaN
13  1   a   c   2       4       7.0
14  1   a   b   4       7       7.0

Shubham Sharma · Accepted Answer

We need to to perform both shift and rolling operation per group, but instead you are performing the shift operation per group then rolling operation for the entire column which is producing the incorrect output.

df['ValueX'] = df.groupby(['Y', 'A'])['ValueA']\
                 .apply(lambda v: v.shift().rolling(3).sum())

print(df)

    Y  A  B  ValueA  ValueB  ValueX
0   0  b  a       1       3     NaN
1   0  c  a       2       2     NaN
2   0  a  b       2       4     NaN
3   1  c  b       1       3     NaN
4   1  a  c       2       1     NaN
5   1  c  a       4       2     NaN
6   1  b  a       7       4     NaN
7   1  c  b       1       5     NaN
8   1  a  b       3       1     NaN
9   1  a  c       2       2     NaN
10  1  b  a       4       2     NaN
11  1  b  c       3       1     NaN
12  1  c  a       1       2     6.0
13  1  a  c       2       4     7.0
14  1  a  b       4       7     7.0
15  1  b  c       5       1    14.0

As a side note, you don't have to explicitly specify the min_periods optional argument, it will default to the window size if not specified.

Groupby two columns pandas dataframe and shift().rolling()

Answers (1)

Related Questions