daiyue
daiyue

Reputation: 7458

pandas use cumsum on a column and create a new boolean column that mark edge case as True

I have the following df,

year_month    pct
201903        50
201903        40
201903         5
201903         5
201904        90
201904         5
201904         5

I want to create a boolean column called non-tail, which satisfies the following condition,

df.sort_values(['pct'], ascending=False).groupby('year_month')['pct'].apply(lambda x: x.cumsum().le(80))

that in non-tail, any next value in pct that will be added which makes cumsum immediately great than 80 will be mark as True as well, so the result will look like

 year_month    pct    non-tail
 201903        50     True
 201903        40     True
 201903         5     False
 201903         5     False
 201904        90     True
 201904         5     False
 201904         5     False

Upvotes: 1

Views: 207

Answers (2)

BENY
BENY

Reputation: 323346

What I will do

df.pct.iloc[::-1].groupby(df['year_month']).cumsum()>20
Out[306]: 
6    False
5    False
4     True
3    False
2    False
1     True
0     True
Name: pct, dtype: bool

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150785

IIUC, you need to shift the cumsum:

df = df.sort_values(['year_month','pct'], ascending=[True,False])
(df.groupby('year_month')['pct']
   .apply(lambda x: x.cumsum().le(80)
                     .shift(fill_value=True)
         )
)

gives you:

0     True
1     True
2    False
3    False
4     True
5    False
6    False
Name: pct, dtype: bool

Upvotes: 2

Related Questions