Reputation: 4398
I have a dataframe, df, where I would like to take a cumulative sum based on a particular column, grouped by 2 diff columns.
Data
diff id date r_pwr d_pwr l_wr e_pwr final start
730 aa q122 34 0 101.5 0 135.5 134.5
730 aa q222 110 0 114.5 0 224.5 -90
500 bb q122 5 2 1 0 4 10
500 bb q222 1 0 1 0 2 8
Desired
diff id date r_pwr d_pwr l_wr e_pwr final start cumul.
730 aa q122 34 0 101.5 0 135.5 134.5 865.5
730 aa q222 110 0 114.5 0 224.5 -90 1090
500 bb q122 5 2 1 0 4 10 504
500 bb q222 1 0 1 0 2 8 506
Doing
df['cumul'] = df['diff'].add(df.groupby('id')['r_pwr'].cumsum()).sub(df.groupby('id')['d_pwr'].cumsum()).add(df.groupby('id')['l_wr'].cumsum()).sub(df.groupby('id')['ex_pwr'].cumsum())
Wonder how to utilize the 'final' column, as I feel I can incorp this to avoid lengthy line of code. Any suggestion is appreciated
Upvotes: 1
Views: 68
Reputation: 75080
You are correct, just having the cumulative sum of final col and addinf to diff would do:
df['cumul'] = df['diff'].add(df.groupby("id")['final'].cumsum())
diff id date r_pwr d_pwr l_wr e_pwr final start cumul
0 730 aa q122 34 0 101.5 0 135.5 134.5 865.5
1 730 aa q222 110 0 114.5 0 224.5 -90.0 1090.0
2 500 bb q122 5 2 1.0 0 4.0 10.0 504.0
3 500 bb q222 1 0 1.0 0 2.0 8.0 506.0
If you didnot have the final column, you could have directly taken the cumsum of those 4 cols, then apply the logic as below:
a = df.groupby("id")[['r_pwr','d_pwr','l_wr','e_pwr']].cumsum()
df['cumul'] = df['diff'].add(a[['r_pwr','l_wr']].sum(1)
.sub(a[['d_pwr','e_pwr']].sum(1)))
Upvotes: 2