running cumulative sum based on baseline column value (Python)

Question

I have a dataframe, df, where I would like to take a cumulative sum based on a particular column, grouped by 2 diff columns.

Data

diff    id  date    r_pwr   d_pwr   l_wr    e_pwr   final   start   
730     aa  q122    34      0       101.5   0       135.5   134.5   
730     aa  q222    110     0       114.5   0       224.5   -90 
500     bb  q122    5       2       1       0       4       10  
500     bb  q222    1       0       1       0       2       8

Desired

diff    id  date    r_pwr   d_pwr   l_wr    e_pwr   final   start   cumul.
730     aa  q122    34      0       101.5   0       135.5   134.5   865.5
730     aa  q222    110     0       114.5   0       224.5   -90     1090
500     bb  q122    5       2       1       0       4        10     504
500     bb  q222    1       0       1       0       2        8      506

Doing

df['cumul'] = df['diff'].add(df.groupby('id')['r_pwr'].cumsum()).sub(df.groupby('id')['d_pwr'].cumsum()).add(df.groupby('id')['l_wr'].cumsum()).sub(df.groupby('id')['ex_pwr'].cumsum())

Wonder how to utilize the 'final' column, as I feel I can incorp this to avoid lengthy line of code. Any suggestion is appreciated

anky · Accepted Answer

You are correct, just having the cumulative sum of final col and addinf to diff would do:

df['cumul'] = df['diff'].add(df.groupby("id")['final'].cumsum())

   diff  id  date  r_pwr  d_pwr   l_wr  e_pwr  final  start   cumul
0   730  aa  q122     34      0  101.5      0  135.5  134.5   865.5
1   730  aa  q222    110      0  114.5      0  224.5  -90.0  1090.0
2   500  bb  q122      5      2    1.0      0    4.0   10.0   504.0
3   500  bb  q222      1      0    1.0      0    2.0    8.0   506.0

If you didnot have the final column, you could have directly taken the cumsum of those 4 cols, then apply the logic as below:

a = df.groupby("id")[['r_pwr','d_pwr','l_wr','e_pwr']].cumsum()
df['cumul'] = df['diff'].add(a[['r_pwr','l_wr']].sum(1)
                        .sub(a[['d_pwr','e_pwr']].sum(1)))

running cumulative sum based on baseline column value (Python)

Answers (1)

Related Questions