luka
luka

Reputation: 529

Groupby two columns pandas dataframe and shift().rolling()

I want to groupby two columns, 'Y' and 'A', then shift().rolling() for column 'ValueA'. I tried this code but result is not correct.

Code

df = pd.DataFrame({
    'Y' : [0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1],
    'A' : ['b','c','a','c','a','c','b','c','a', 'a', 'b', 'b','c','a','a','b'],
    'B': ['a', 'a', 'b', 'b','c','a','a','b','b','c','a','c','a','c','b','c'],
    'ValueA':[1,2,2,1,2,4,7,1,3,2,4,3,1,2,4,5],
    'ValueB':[3,2,4,3,1,2,4,5,1,2,2,1,2,4,7,1]
})
df['ValueX'] = df.groupby(['Y','A'])['ValueA'].shift().rolling(3, min_periods=3).sum()

Output for 'A' == a

    Y   A   B   ValueA  ValueB  ValueX
2   0   a   b   2       4       NaN
4   1   a   c   2       1       NaN
8   1   a   b   3       1       NaN
9   1   a   c   2       2       9.0
13  1   a   c   2       4       7.0
14  1   a   b   4       7       5.0

Expected Output

    Y   A   B   ValueA  ValueB  ValueX
2   0   a   b   2       4       NaN
4   1   a   c   2       1       NaN
8   1   a   b   3       1       NaN
9   1   a   c   2       2       NaN
13  1   a   c   2       4       7.0
14  1   a   b   4       7       7.0

Upvotes: 1

Views: 711

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71687

We need to to perform both shift and rolling operation per group, but instead you are performing the shift operation per group then rolling operation for the entire column which is producing the incorrect output.

df['ValueX'] = df.groupby(['Y', 'A'])['ValueA']\
                 .apply(lambda v: v.shift().rolling(3).sum())

print(df)

    Y  A  B  ValueA  ValueB  ValueX
0   0  b  a       1       3     NaN
1   0  c  a       2       2     NaN
2   0  a  b       2       4     NaN
3   1  c  b       1       3     NaN
4   1  a  c       2       1     NaN
5   1  c  a       4       2     NaN
6   1  b  a       7       4     NaN
7   1  c  b       1       5     NaN
8   1  a  b       3       1     NaN
9   1  a  c       2       2     NaN
10  1  b  a       4       2     NaN
11  1  b  c       3       1     NaN
12  1  c  a       1       2     6.0
13  1  a  c       2       4     7.0
14  1  a  b       4       7     7.0
15  1  b  c       5       1    14.0

As a side note, you don't have to explicitly specify the min_periods optional argument, it will default to the window size if not specified.

Upvotes: 2

Related Questions