Reputation: 215
I have dataframe which has col1-col10, I want to calculate cumulative sum across columns and create new columns on the go i.e. cum_col1-cum_col10. I looked into cumsum(), but that gives final cumulative sum. How to achieve cumulative sum while creating new columns.
Dataframe looks like:
id col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
1 5 5 5 5 0 0 1 4 3 3
2 0 0 0 0 0 0 4 4 0 0
3 0 1 2 3 0 0 0 5 6 0
Expected results:
id cum_col1 cum_col2 cum_col3 cum_col4 cum_col5 cum_col6 cum_col7 cum_col8 cum_col9 cum_col10
1 5 10 15 20 20 20 21 25 28 31
2 0 0 0 0 0 0 4 8 8 8
3 0 1 3 6 6 6 6 11 17 17
The DDL to generate the dataframe:
import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3],
'col1': [5, 0, 0],
'col2': [5, 0, 1],
'col3': [5, 0, 2],
'col4': [5, 0, 3],
'col5' : [0, 0, 0],
'col6': [0, 0, 0],
'col7': [1, 4, 0],
'col8': [4, 4, 5],
'col9': [3, 0, 6],
'col10': [3, 0, 0]})
Thanks!
Upvotes: 0
Views: 2974
Reputation: 148910
The cumsum
method does the trick, provided you hide the id
column into the index:
df.set_index('id', append=True).cumsum(axis=1).rename(
columns = lambda x: 'cum_' + x).reset_index(level='id')
Ig gives as expected:
id cum_col1 cum_col2 cum_col3 ... cum_col7 cum_col8 cum_col9 cum_col10
0 1 5 10 15 ... 21 25 28 31
1 2 0 0 0 ... 4 8 8 8
2 3 0 1 3 ... 6 11 17 17
Upvotes: 1
Reputation: 10792
You can apply an anonymous function to each line:
df = df.groupby('id').apply(lambda x: x.drop('id', axis=1).add_prefix('cum_').cumsum(axis=1).sum()).reset_index()
The function add_prefix()
allow you to add a prefix to each column name.
Result:
id cum_col1 cum_col2 cum_col3 ... cum_col7 cum_col8 cum_col9 cum_col10
0 1 5 10 15 ... 21 25 28 31
1 2 0 0 0 ... 4 8 8 8
2 3 0 1 3 ... 6 11 17 17
Upvotes: 2