Reputation: 117
I've been looking through the .rolling documentation and it seems like you need to provide a window, but all I would like to do is just sum all previous occurrences of a certain column in order of occurs
.
import pandas as pd
import numpy as np
df = pd.DataFrame({'person':['john','mike','john','mike','john','mike','john','mike','john','mike'],
'star':[1,1,1,1,1,1,1,1,1,1],
'occurs':[1001,1001,1002,1002,1003,1003,1004,1004,1005,1005],
})
Here is what I tried, this is not the correct solution.
df['b'] = df.groupby('person')['star'].transform(lambda x: x.rolling(2).sum().shift())
Desired Result:
df_result = pd.DataFrame({'person':['john','mike','john','mike','john','mike','john','mike','john','mike'],
'star':[1,1,1,1,1,1,1,1,1,1],
'occurs':[1001,1001,1002,1002,1003,1003,1004,1004,1005,1005],
'b':[None,None,1,1,2,2,3,3,4,4],
})
Upvotes: 0
Views: 123
Reputation: 680
You were almost there. Instead of the rolling
window you can use cumsum()
import pandas as pd
import numpy as np
df = pd.DataFrame({'person':['john','mike','john','mike','john','mike','john','mike','john','mike'],
'star':[1,1,1,1,1,1,1,1,1,1],
'occurs':[1001,1001,1002,1002,1003,1003,1004,1004,1005,1005],
})
df['b'] = df.groupby('person')['star'].cumsum() - 1
df.loc[df['b'] == 0, 'b'] = np.nan
df
The -1
and the df.loc[df['b'] == 0, 'b'] = np.nan
is just to get the exact same result that you want.
Upvotes: 2