Reputation: 73

DataFrame: cumulative sum of column until condition is reached and return sum in new column

I am new in Python and am currently facing an issue I can't solve. I really hope you can help me out. English is not my native language so I am sorry if I am not able to express myself properly.

Lets suppose i have a data frame like:

import pandas as pd
df = pd.DataFrame({'a': [1111,2222,3333,4444,5555,6666,7777,8888,9999], 'b':[0,0,1,0,1,0,0,0,1]})

I need to sum of the data in "a" until the condition "there is a value in b" is reached. This means for the given Data Frame:

At index=2 there is a 1 in b --> sum rows 0+1+2 = 6666
At index=4 there is a 1 in b --> sum rows 3+4 = 9999
At index=8 there is a 1 in b --> sum rows 5+6+7+8 = 33330

I tried if else cases, but with no satisfactorily output..

greetings

Upvotes: 3

Answers (3)

Ronan

Reputation: 1

it is possible to do that

df.merge(df.groupby(df['b'].shift(fill_value=0).cumsum())['a'].cumsum().to_frame().rename(columns={'a':'cumul_sum'}), left_index=True, right_index=True)

and the result is

╭────┬──────┬─────┬─────────────╮
│    │    a │   b │   cumul_sum │
├────┼──────┼─────┼─────────────┤
│  0 │ 1111 │   0 │        1111 │
│  1 │ 2222 │   0 │        3333 │
│  2 │ 3333 │   1 │        6666 │
│  3 │ 4444 │   0 │        4444 │
│  4 │ 5555 │   1 │        9999 │
│  5 │ 6666 │   0 │        6666 │
│  6 │ 7777 │   0 │       14443 │
│  7 │ 8888 │   0 │       23331 │
│  8 │ 9999 │   1 │       33330 │
╰────┴──────┴─────┴─────────────╯

Upvotes: 0

jezrael

Reputation: 863711

Use Series.shift with cumulative sum by Series.cumsum and then aggregate sum:

df = df.groupby(df.b.shift(fill_value=0).cumsum())['a'].sum().rename_axis(None).to_frame()
print (df)
       a
0   6666
1   9999
2  33330

For new column use GroupBy.transform with sum first and then set 0 if no match 1 in b by numpy.where:

s = df.groupby(df.b.shift(fill_value=0).cumsum())['a'].transform('sum')
df['cumsum'] = np.where(df.b == 1, s, 0)

print (df)
      a  b  cumsum
0  1111  0       0
1  2222  0       0
2  3333  1    6666
3  4444  0       0
4  5555  1    9999
5  6666  0       0
6  7777  0       0
7  8888  0       0
8  9999  1   33330

Upvotes: 3

Valdi_Bo

Reputation: 31011

Run:

df.a.groupby(df.b[::-1].cumsum()).sum()\
    .sort_index(ascending=False).reset_index(drop=True).to_frame()

Note that grouping is performed in reversed order of b, so looking at this column in "forward" order, each value of 1 terminates the current group.

"Post-processing" steps involve:

reverse the order (by index),
reset index,
convert to a DataFrame (if you need).

Upvotes: 1

DataFrame: cumulative sum of column until condition is reached and return sum in new column

Answers (3)

Related Questions