Reputation: 143
I am trying to do a cumsum grouped by the name for the following dataframe:
df = pd.DataFrame( columns = ['Name','Status'])
df['Name'] = ['Anton','Anton','Anton','Anton','Bev','Bev','Bev','Bev','Chad','Chad','Chad','Chad']
df['Status'] =[1,1,1,1,1,1,0,1,0,0,1,1]
df
The first instance for each name should start at 1 and it should only sum thereafter if the status column is equal to 1, if not the value should carry over in order to give the additional column as follows:
df['Expected Result'] = [1,2,3,4,1,2,2,3,1,1,1,2]
df
I have tried doing a groupby on name and using the cumsum()+1 function but I am not sure on how to proceed.
Upvotes: 0
Views: 124
Reputation: 323396
We can do transform
to adjust
df.groupby('Name')['Status'].cumsum() + 1-df.groupby('Name')['Status'].transform('first')
Out[8]:
0 1
1 2
2 3
3 4
4 1
5 2
6 2
7 3
8 1
9 1
10 2
11 3
Name: Status, dtype: int64
Upvotes: 1
Reputation: 1
have you tried df.groupby(['Name']).cumsum() ? this is the result:
Status
0 1
1 2
2 3
3 1
4 1
5 2
6 2
7 3
8 0
9 0
10 1
11 2
Upvotes: 0
Reputation: 8229
Is this what you want?
df['Expected Result'] = df.groupby('Name').cumsum()
df
produces
Name Status Expected Result
0 Anton 1 1
1 Anton 1 2
2 Anton 1 3
3 Anton 1 4
4 Bev 1 1
5 Bev 1 2
6 Bev 0 2
7 Bev 1 3
8 Chad 0 0
9 Chad 0 0
10 Chad 1 1
11 Chad 1 2
Upvotes: 0