Reputation: 4408
I have a dataset, where I would like to sum multiple columns and then create multiple columns with a running total. The output has to be grouped by id and date.
Data
id date t1 t2 total start curr_t2 curr_t2
a q1 22 4 1 5 50 25 20
a q2 22 1 1 2 50 25 20
a q3 22 0 0 0 50 25 20
b q1 22 5 5 10 100 30 40
b q2 22 2 2 4 100 30 70
b q3 22 3 4 7 100 30 70
Desired
id date t1 t2 total start cur_t1 cur_t2 final finalt1 finalt2
a q1 22 4 1 5 50 25 20 55 29 21
a q2 22 1 1 2 50 25 20 57 30 22
a q3 22 0 0 0 50 25 20 57 30 22
b q1 22 5 5 10 100 30 40 110 35 27
b q2 22 2 2 4 100 30 70 114 37 29
b q3 22 3 4 7 100 30 70 121 40 33
**Logic:**
sum the 'start' and 'total' column to create the 'final' column
however, final column is a running total, adding to the 'total' column.
EX.
start = 50 total = 5 so final = 55
then
the next row we have a total of 2 , so the final for the next row is 57
Similar logic with the newly created: finalt1 and finalt2
etc
Doing
I believe I have to use the .cumsum function to obtain the desired output The output has to be grouped by id and date
out['final'] = out['final'].sub(out.groupby('id')['start'].cumsum())
However, this output is not grouping by pod and date. I am still researching, any suggestion is appreciated.
Upvotes: 0
Views: 200
Reputation: 323316
Try with
df['final'] = df['start'].add(df.groupby('id')['total'].cumsum())
df['final1'] = df['t1'].add(df.groupby('id')['curr_t1'].cumsum())
df['final2'] = df['t2'].add(df.groupby('id')['curr_t2'].cumsum())
Upvotes: 1