Get rows of max values for unique value in other column: python

Question

I am trying to filter the following dataframe from the view seen in 'Initial dataframe' to what is displayed in 'Desired output'

Initial dataframe

name          group     subject  score   class_size
                                 
Steve       classrm_A   maths    98.22      20
John        classrm_A   maths    76.87      30
Mary        classrm_C   science  77.25      26
Steve       classrm_B   science  65.28      32
Mary        classrm_A   english  86.01      16
John        classrm_F   science  96.55      25

Return rows for unique 'name' values where score' is greatest and 'class_size' is equal or greater than 25.

Desired output:

name          group     subject  score   class_size
                                 
Steve       classrm_B   science  65.28      32
Mary        classrm_C   science  77.25      26
John        classrm_F   science  96.55      25

Here is what I have attempted so far.....

min_class = df["class_size"] >= 25


df = df["min_class "]


df = df.groupby(['name']).max('score')

Any help would be greatly appreciated.

Andreas · Accepted Answer

Keep only rows with class size above 25 in dataframe, then sort dataframe by score, then drop all duplicates of column "name" and keep only the first row in case of duplicates.

df = df[df["class_size"] >= 25]
df = df.sort_values("score", ascending=False)
df = df.drop_duplicates(subset=["name"], keep="first")

Output:

Out[23]: 
    name      group  subject  score  class_size
5   John  classrm_F  science  96.55          25
2   Mary  classrm_C  science  77.25          26
3  Steve  classrm_B  science  65.28          32

Get rows of max values for unique value in other column: python

Answers (2)

Related Questions