windwalker
windwalker

Reputation: 379

Get rows of max values for unique value in other column: python

I am trying to filter the following dataframe from the view seen in 'Initial dataframe' to what is displayed in 'Desired output'

Initial dataframe

name          group     subject  score   class_size
                                 
Steve       classrm_A   maths    98.22      20
John        classrm_A   maths    76.87      30
Mary        classrm_C   science  77.25      26
Steve       classrm_B   science  65.28      32
Mary        classrm_A   english  86.01      16
John        classrm_F   science  96.55      25

Return rows for unique 'name' values where score' is greatest and 'class_size' is equal or greater than 25.

Desired output:

name          group     subject  score   class_size
                                 
Steve       classrm_B   science  65.28      32
Mary        classrm_C   science  77.25      26
John        classrm_F   science  96.55      25

Here is what I have attempted so far.....

min_class = df["class_size"] >= 25


df = df["min_class "]


df = df.groupby(['name']).max('score')

Any help would be greatly appreciated.

Upvotes: 2

Views: 1629

Answers (2)

Andreas
Andreas

Reputation: 9197

Keep only rows with class size above 25 in dataframe, then sort dataframe by score, then drop all duplicates of column "name" and keep only the first row in case of duplicates.

df = df[df["class_size"] >= 25]
df = df.sort_values("score", ascending=False)
df = df.drop_duplicates(subset=["name"], keep="first")

Output:

Out[23]: 
    name      group  subject  score  class_size
5   John  classrm_F  science  96.55          25
2   Mary  classrm_C  science  77.25          26
3  Steve  classrm_B  science  65.28          32

Upvotes: 3

Georgina Skibinski
Georgina Skibinski

Reputation: 13377

This will do the trick for you. I sense it's slightly different approach, than you tried, but I dare say it's more "pythonish".

res = df.loc[df['class_size'].ge(25)].copy()
res['rank'] = res.groupby('name')['score'].rank(ascending=False)
res = res.loc[res['rank'].eq(1)].drop('rank', axis=1)

The key here is .rank() function with .groupby() - .rank(ascending=False) will essentially return ranking in a descending order, .groupby() will make you do the ranking for each group separately.

Upvotes: 1

Related Questions