Reputation: 7458
I have the following df
,
year_month pct
201903 50
201903 40
201903 5
201903 5
201904 90
201904 5
201904 5
I want to create a boolean column called non-tail
, which satisfies the following condition,
df.sort_values(['pct'], ascending=False).groupby('year_month')['pct'].apply(lambda x: x.cumsum().le(80))
that in non-tail
, any next value in pct
that will be added which makes cumsum immediately great than 80 will be mark as True
as well, so the result will look like
year_month pct non-tail
201903 50 True
201903 40 True
201903 5 False
201903 5 False
201904 90 True
201904 5 False
201904 5 False
Upvotes: 1
Views: 207
Reputation: 323346
What I will do
df.pct.iloc[::-1].groupby(df['year_month']).cumsum()>20
Out[306]:
6 False
5 False
4 True
3 False
2 False
1 True
0 True
Name: pct, dtype: bool
Upvotes: 3
Reputation: 150785
IIUC, you need to shift the cumsum
:
df = df.sort_values(['year_month','pct'], ascending=[True,False])
(df.groupby('year_month')['pct']
.apply(lambda x: x.cumsum().le(80)
.shift(fill_value=True)
)
)
gives you:
0 True
1 True
2 False
3 False
4 True
5 False
6 False
Name: pct, dtype: bool
Upvotes: 2