Reputation: 7448
I have the following df
,
code pct year_month
10 6.6156 201905
10 6.0868 201905
10 5.8975 201905
10 11.2195 201905
10 11.1404 201905
I like to do the following
df2 = df.sort_values('pct', ascending=False)
df2['pct'].cumsum().le(20).mean()
0.2
in a groupby
way,
df.groupby(['year_month', 'code'])['pct'].apply(lambda x: x.cumsum().le(20).mean())
but the result is different,
year_month code
201905 BR10 0.6
I thought groupby
should sort pct
in descending order by default, but it seems not, so I am wondering how to sort pct
within in each year_month
, code
group first then do the cumsum
;
Upvotes: 2
Views: 310
Reputation: 862511
Your code is different, for same output need sorting by first 2 columns - ['year_month','code']
or with this sample data omit it (if already sorted):
print (df['pct'].cumsum().le(20).mean())
0.6
df2 = df.sort_values(['year_month','code'], ascending=False)
print (df2['pct'].cumsum().le(20).mean())
0.6
In groupby are sorting values in groupby
- here ['year_month', 'code']
, not pct
:
df = df.groupby(['year_month', 'code'])['pct'].apply(lambda x: x.cumsum().le(20).mean())
print (df)
year_month code
201905 10 0.6
Name: pct, dtype: float64
So for same output sorting here both by pct
column and if necessary for prevent sorting by 'year_month', 'code']
is added sort=False
:
df2 = df.sort_values('pct', ascending=False)
print (df2['pct'].cumsum().le(20).mean())
0.2
df = (df.sort_values(['pct'], ascending=False)
.groupby(['year_month', 'code'], sort=False)['pct']
.apply(lambda x: x.cumsum().le(20).mean()))
print (df)
year_month code
201905 10 0.2
Name: pct, dtype: float64
Upvotes: 2