Reputation: 73
I am new in Python and am currently facing an issue I can't solve. I really hope you can help me out. English is not my native language so I am sorry if I am not able to express myself properly.
Lets suppose i have a data frame like:
import pandas as pd
df = pd.DataFrame({'a': [1111,2222,3333,4444,5555,6666,7777,8888,9999], 'b':[0,0,1,0,1,0,0,0,1]})
I need to sum of the data in "a" until the condition "there is a value in b" is reached. This means for the given Data Frame:
At index=2 there is a 1 in b --> sum rows 0+1+2 = 6666
At index=4 there is a 1 in b --> sum rows 3+4 = 9999
At index=8 there is a 1 in b --> sum rows 5+6+7+8 = 33330
I tried if else cases, but with no satisfactorily output..
greetings
Upvotes: 3
Views: 2037
Reputation: 1
it is possible to do that
df.merge(df.groupby(df['b'].shift(fill_value=0).cumsum())['a'].cumsum().to_frame().rename(columns={'a':'cumul_sum'}), left_index=True, right_index=True)
and the result is
╭────┬──────┬─────┬─────────────╮
│ │ a │ b │ cumul_sum │
├────┼──────┼─────┼─────────────┤
│ 0 │ 1111 │ 0 │ 1111 │
│ 1 │ 2222 │ 0 │ 3333 │
│ 2 │ 3333 │ 1 │ 6666 │
│ 3 │ 4444 │ 0 │ 4444 │
│ 4 │ 5555 │ 1 │ 9999 │
│ 5 │ 6666 │ 0 │ 6666 │
│ 6 │ 7777 │ 0 │ 14443 │
│ 7 │ 8888 │ 0 │ 23331 │
│ 8 │ 9999 │ 1 │ 33330 │
╰────┴──────┴─────┴─────────────╯
Upvotes: 0
Reputation: 862591
Use Series.shift
with cumulative sum by Series.cumsum
and then aggregate sum
:
df = df.groupby(df.b.shift(fill_value=0).cumsum())['a'].sum().rename_axis(None).to_frame()
print (df)
a
0 6666
1 9999
2 33330
For new column use GroupBy.transform
with sum
first and then set 0
if no match 1
in b
by numpy.where
:
s = df.groupby(df.b.shift(fill_value=0).cumsum())['a'].transform('sum')
df['cumsum'] = np.where(df.b == 1, s, 0)
print (df)
a b cumsum
0 1111 0 0
1 2222 0 0
2 3333 1 6666
3 4444 0 0
4 5555 1 9999
5 6666 0 0
6 7777 0 0
7 8888 0 0
8 9999 1 33330
Upvotes: 3
Reputation: 30971
Run:
df.a.groupby(df.b[::-1].cumsum()).sum()\
.sort_index(ascending=False).reset_index(drop=True).to_frame()
Note that grouping is performed in reversed order of b, so looking at this column in "forward" order, each value of 1 terminates the current group.
"Post-processing" steps involve:
Upvotes: 1