Reputation: 146
I have a table of data such as:
F(1) F(2) F(3) Amount
A B C 100
A B C 100
A B C 100
D E F 300
D E F 150
G H I 100
G H I 200
I would like to produce a new column, showing the cumulative sum of field 'Amount', but one that resets to 0 whenever the key of columns F(1), F(2) and F(3) change.
i.e. I would like to create the following output (sans dotted lines!)
F(1) F(2) F(3) Amount CumSum
A B C 100 100
A B C 100 200
A B C 100 300
------------------------------ resets to zero as key changes
D E F 300 300
D E F 150 450
------------------------------ resets to zero as key changes
G H I 100 100
G H I 200 300
I have potentially up to a million rows in this table so I am looking for a robust implementation. Is pandas the way forward here? I have not used pandas before but am happy to explore.
Upvotes: 0
Views: 259
Reputation:
group by your key columns and call cumsum:
df['CumSum'] = df.groupby(['F(1)', 'F(2)', 'F(3)'])['Amount'].cumsum()
df
Out:
F(1) F(2) F(3) Amount CumSum
0 A B C 100 100
1 A B C 100 200
2 A B C 100 300
3 D E F 300 300
4 D E F 150 450
5 G H I 100 100
6 G H I 200 300
Upvotes: 1