Mike.J
Mike.J

Reputation: 117

Rolling.sum() with all previous occurrences?

I've been looking through the .rolling documentation and it seems like you need to provide a window, but all I would like to do is just sum all previous occurrences of a certain column in order of occurs.

import pandas as pd
import numpy as np

df = pd.DataFrame({'person':['john','mike','john','mike','john','mike','john','mike','john','mike'],
                   'star':[1,1,1,1,1,1,1,1,1,1],
                   'occurs':[1001,1001,1002,1002,1003,1003,1004,1004,1005,1005],
                    })

enter image description here

Here is what I tried, this is not the correct solution.

df['b'] = df.groupby('person')['star'].transform(lambda x: x.rolling(2).sum().shift())

Desired Result:

df_result = pd.DataFrame({'person':['john','mike','john','mike','john','mike','john','mike','john','mike'],
                          'star':[1,1,1,1,1,1,1,1,1,1],
                          'occurs':[1001,1001,1002,1002,1003,1003,1004,1004,1005,1005],
                          'b':[None,None,1,1,2,2,3,3,4,4],
                          })

enter image description here

Upvotes: 0

Views: 123

Answers (1)

Carlos Azevedo
Carlos Azevedo

Reputation: 680

You were almost there. Instead of the rolling window you can use cumsum()

import pandas as pd
import numpy as np

df = pd.DataFrame({'person':['john','mike','john','mike','john','mike','john','mike','john','mike'],
                   'star':[1,1,1,1,1,1,1,1,1,1],
                   'occurs':[1001,1001,1002,1002,1003,1003,1004,1004,1005,1005],
                    })

df['b'] = df.groupby('person')['star'].cumsum() - 1
df.loc[df['b'] == 0, 'b'] = np.nan
df

The -1 and the df.loc[df['b'] == 0, 'b'] = np.nan is just to get the exact same result that you want.

Upvotes: 2

Related Questions