DaytaSigntist
DaytaSigntist

Reputation: 117

Pandas groupby where the column value is greater than the group's x percentile

I have a pandas dataframe as follows:

df = pd.DataFrame()
df['Name'] = ['Abby', 'Abby', 'Abby', 'Abby', 'Abby', 'Daniel', 'Daniel', 'Daniel', 'Daniel', 'Daniel']
df['Marks'] = [100, 90, 76, 50, 10, 50, 45, 38, 25, 5]

I want to:

  1. Find the 40th percentile for each group
  2. Filter the dataframe such that all the values above the 40th percentile for that group are shown.

So, I have found the 40th percentile for each group using:

df.groupby('Name').quantile(0.4)

The Aim is to get to:

enter image description here

My main issue is that the values for each group are not standardized and so I cannot apply an overall percentile value for the entire dataset.

But all the help I saw regarding filtering a dataframe with a certain value does not do it separately for each group. I have seen the following questions:

Pandas, groupby where column value is greater than x

Pandas Groupby apply function to count values greater than zero

My question essentially builds on a variation of the following question: Calculate Arbitrary Percentile on Pandas GroupBy

Is there a way to do this in Pandas?

Upvotes: 5

Views: 3938

Answers (2)

rafaelc
rafaelc

Reputation: 59274

Using your code for percentiles, and loc, ge for >= (or gt, for >) and index matching:

df = df.set_index('Name')
df.loc[df.Marks.ge(df.groupby('Name').quantile(0.4).Marks)]

    Name    Marks
0   Abby    100
1   Abby    90
2   Abby    76
5   Daniel  50
6   Daniel  45
7   Daniel  38

Upvotes: 1

BENY
BENY

Reputation: 323226

You can using transform

df[df.Marks>df.groupby('Name').Marks.transform('quantile',0.4)]
Out[712]: 
     Name  Marks
0    Abby    100
1    Abby     90
2    Abby     76
5  Daniel     50
6  Daniel     45
7  Daniel     38

Upvotes: 4

Related Questions