tedioustortoise
tedioustortoise

Reputation: 269

Pandas Dataframe cumsum function to "restart" when value in row changes

df['total'] = (df.DR - df.CR).cumsum()

is giving:

...
    Name     DR       CR     total 
303   B3  46.80     0.00  46682.07                                                                                                         
304   B3  45.20     0.00  46727.27                                                                                                                                                                                                                
395  BS1   0.00    10.37  47905.31                                                                                                         
396  BS2   0.00    87.00  47818.31                                                                                                         
397    C   0.00   482.10  47336.21                                                                                                                                                                                                               
399    C  20.00     0.00  47356.21

However I would like the cumsum to "restart" whenever the "Name" column (B3,BS1,C) changes to a different value.

So the desired result is:

    Name     DR       CR     total 
303   B3  46.80     0.00     46.80                                                                                                         
304   B3  45.20     0.00      1.60                                                                                                                                                                                                                
395  BS1   0.00    10.37    -10.37                                                                                                         
396  BS2   0.00    87.00    -97.37                                                                                                         
397    C   0.00   482.10   -482.10                                                                                                                                                                                                               
399    C  20.00     0.00   -462.10 

I am new to pandas. Thanks for your help

I have tried things like, but not working: df['total'] = df.groupby('GL')[(df.DR - df.CR)].cumsum()

Upvotes: 0

Views: 1024

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30991

First variant - if you want to "join" all rows for each Name into a single group:

df['total'] = df.groupby('Name').apply(lambda grp:
    (grp.DR - grp.CR).cumsum()).reset_index(level=0, drop=True)

For your source data the result is:

    Name    DR      CR   total
303   B3  46.8    0.00   46.80
304   B3  45.2    0.00   92.00
395  BS1   0.0   10.37  -10.37
396  BS1   0.0   87.00  -97.37
397    C   0.0  482.10 -482.10
399    C  20.0    0.00 -462.10

Second variant - if any change in Name is to start a new group.

Assume that your DataFrame contains:

    Name    DR      CR
303   B3  46.8    0.00
304   B3  45.2    0.00
395  BS1   0.0   10.37
396  BS1   0.0   87.00
397    C   0.0  482.10
399    C  20.0    0.00
400   B3  53.0    8.00
401   B3  40.8    6.15

and the second B3 group is to be summed separately from the first group:

df['total'] = df.groupby((df.Name != df.Name.shift()).cumsum())\
    .apply(lambda grp: (grp.DR - grp.CR).cumsum()).reset_index(level=0, drop=True)

getting:

    Name    DR      CR   total
303   B3  46.8    0.00   46.80
304   B3  45.2    0.00   92.00
395  BS1   0.0   10.37  -10.37
396  BS1   0.0   87.00  -97.37
397    C   0.0  482.10 -482.10
399    C  20.0    0.00 -462.10
400   B3  53.0    8.00   45.00
401   B3  40.8    6.15   79.65

As you can see, the second B3 group is summed separately.

Upvotes: 1

Related Questions