Quincy
Quincy

Reputation: 143

How to do cumsum based on column condition?

I am trying to do a cumsum grouped by the name for the following dataframe:

df = pd.DataFrame( columns = ['Name','Status']) 
df['Name'] = ['Anton','Anton','Anton','Anton','Bev','Bev','Bev','Bev','Chad','Chad','Chad','Chad']
df['Status'] =[1,1,1,1,1,1,0,1,0,0,1,1] 
df

The first instance for each name should start at 1 and it should only sum thereafter if the status column is equal to 1, if not the value should carry over in order to give the additional column as follows:

df['Expected Result'] = [1,2,3,4,1,2,2,3,1,1,1,2]
df

I have tried doing a groupby on name and using the cumsum()+1 function but I am not sure on how to proceed.

Upvotes: 0

Views: 124

Answers (3)

BENY
BENY

Reputation: 323396

We can do transform to adjust

df.groupby('Name')['Status'].cumsum() + 1-df.groupby('Name')['Status'].transform('first')
Out[8]: 
0     1
1     2
2     3
3     4
4     1
5     2
6     2
7     3
8     1
9     1
10    2
11    3
Name: Status, dtype: int64

Upvotes: 1

justmeh
justmeh

Reputation: 1

have you tried df.groupby(['Name']).cumsum() ? this is the result:

    Status
0        1
1        2
2        3
3        1
4        1
5        2
6        2
7        3
8        0
9        0
10       1
11       2

Upvotes: 0

piterbarg
piterbarg

Reputation: 8229

Is this what you want?

df['Expected Result'] = df.groupby('Name').cumsum()
df

produces

    Name    Status  Expected Result
0   Anton   1   1
1   Anton   1   2
2   Anton   1   3
3   Anton   1   4
4   Bev     1   1
5   Bev     1   2
6   Bev     0   2
7   Bev     1   3
8   Chad    0   0
9   Chad    0   0
10  Chad    1   1
11  Chad    1   2

Upvotes: 0

Related Questions