Krishnang K Dalal
Krishnang K Dalal

Reputation: 2566

Pandas: GroupBy Shift And Cumulative Sum

I want to do groupby, shift and cumsum which seems pretty trivial task but still banging my head over the result I'm getting. Can someone please tell what am I doing wrong. All the results I found online shows the same or the same variation of what I am doing. Below is my implementation.

temp = pd.DataFrame(data=[['a',1],['a',1],['a',1],['b',1],['b',1],['b',1],['c',1],['c',1]], columns=['ID','X'])

temp['transformed'] = temp.groupby('ID')['X'].cumsum().shift()
print(temp)

   ID   X   transformed
0   a   1   NaN
1   a   1   1.0
2   a   1   2.0
3   b   1   3.0
4   b   1   1.0
5   b   1   2.0
6   c   1   3.0
7   c   1   1.0

This is wrong because the actual or what I am looking for is as below:

   ID   X   transformed
0   a   1   NaN
1   a   1   1.0
2   a   1   2.0
3   b   1   NaN
4   b   1   1.0
5   b   1   2.0
6   c   1   NaN
7   c   1   1.0

Thanks a lot in advance.

Upvotes: 12

Views: 6366

Answers (3)

Kazu
Kazu

Reputation: 51

While working on this problem, as the DataFrame size grows, using lambdas on transform starts to get very slow. I found out that using some DataFrameGroupBy methods (like cumsum and shift instead of lambdas are much faster.

So here's my proposed solution, creating a 'temp' column to save the cumsum for each ID and then shifting in a different groupby:

df['temp'] = df.groupby("ID")['X'].cumsum()
df['transformed'] = df.groupby("ID")['temp'].shift()
df = df.drop(columns=["temp"])

Upvotes: 5

leerssej
leerssej

Reputation: 14988

You could use transform() to feed the separate groups that are created at each level of groupby into the cumsum() and shift() methods.

temp['transformed'] = \
    temp.groupby('ID')['X'].transform(lambda x: x.cumsum().shift())
  ID  X   transformed
0  a  1   NaN
1  a  1   1.0
2  a  1   2.0
3  b  1   NaN
4  b  1   1.0
5  b  1   2.0
6  c  1   NaN
7  c  1   1.0

For more info on transform() please see here:

Upvotes: 10

BENY
BENY

Reputation: 323326

You need using apply , since one function is under groupby object which is cumsum another function shift is for all df

temp['transformed'] = temp.groupby('ID')['X'].apply(lambda x  : x.cumsum().shift())
temp
Out[287]: 
  ID  X  transformed
0  a  1          NaN
1  a  1          1.0
2  a  1          2.0
3  b  1          NaN
4  b  1          1.0
5  b  1          2.0
6  c  1          NaN
7  c  1          1.0

Upvotes: 4

Related Questions