Anku
Anku

Reputation: 215

Python Pandas Cumulative Sum across columns and get the result in another new columns

I have dataframe which has col1-col10, I want to calculate cumulative sum across columns and create new columns on the go i.e. cum_col1-cum_col10. I looked into cumsum(), but that gives final cumulative sum. How to achieve cumulative sum while creating new columns.

Dataframe looks like:

id    col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
1     5    5     5   5    0    0    1    4    3    3 
2     0    0     0   0    0    0    4    4    0    0 
3     0    1     2   3    0    0    0    5    6    0

Expected results:

id  cum_col1 cum_col2 cum_col3 cum_col4 cum_col5 cum_col6 cum_col7 cum_col8 cum_col9 cum_col10
1   5        10       15       20       20       20       21       25       28       31 
2   0        0        0        0        0        0        4        8        8        8 
3   0        1        3        6        6        6        6        11       17       17

The DDL to generate the dataframe:

import pandas as pd

df = pd.DataFrame({'id': [1, 2, 3],
                   'col1': [5, 0, 0],
                   'col2': [5, 0, 1],
                   'col3': [5, 0, 2],
                   'col4': [5, 0, 3],
                   'col5' : [0, 0, 0],
                   'col6': [0, 0, 0],
                   'col7': [1, 4, 0],
                   'col8': [4, 4, 5],
                   'col9': [3, 0, 6],
                   'col10': [3, 0, 0]})

Thanks!

Upvotes: 0

Views: 2974

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148910

The cumsum method does the trick, provided you hide the id column into the index:

df.set_index('id', append=True).cumsum(axis=1).rename(
    columns = lambda x: 'cum_' + x).reset_index(level='id')

Ig gives as expected:

   id  cum_col1  cum_col2  cum_col3  ...  cum_col7  cum_col8  cum_col9  cum_col10
0   1         5        10        15  ...        21        25        28         31
1   2         0         0         0  ...         4         8         8          8
2   3         0         1         3  ...         6        11        17         17

Upvotes: 1

obchardon
obchardon

Reputation: 10792

You can apply an anonymous function to each line:

df = df.groupby('id').apply(lambda x: x.drop('id', axis=1).add_prefix('cum_').cumsum(axis=1).sum()).reset_index()

The function add_prefix() allow you to add a prefix to each column name.

Result:

   id  cum_col1  cum_col2  cum_col3  ...  cum_col7  cum_col8  cum_col9  cum_col10
0   1         5        10        15  ...        21        25        28         31
1   2         0         0         0  ...         4         8         8          8
2   3         0         1         3  ...         6        11        17         17

Upvotes: 2

Related Questions