luca pescatore
luca pescatore

Reputation: 119

Top X% by group in pandas

I have some machines and their sales in different groups e.g. machine age. My goal is to isolate the top 5% machines (for sales amount) within each group.

I managed to do this on the full dataframe using the following code

df['quantile'] = pd.qcut(df.Sales, 20, labels=range(20))
top = df[df['quantile']==0]

But I cannot seem to manage to do the same when using groupby. Essentially the assignment does not work. Any clue?

Upvotes: 2

Views: 58

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

You can do groupby.transform and quantile:

df = pd.DataFrame({'a': np.random.randint(1,4,1000),
                   'Sales': np.random.uniform(0,1,1000)})
df['quantile'] = df.groupby('a').Sales.transform('quantile', 0.95)

#filter:
top = df[df.Sales.gt(df['quantile'])]

Upvotes: 2

Related Questions