Reputation: 379
I am trying to filter the following dataframe from the view seen in 'Initial dataframe' to what is displayed in 'Desired output'
Initial dataframe
name group subject score class_size
Steve classrm_A maths 98.22 20
John classrm_A maths 76.87 30
Mary classrm_C science 77.25 26
Steve classrm_B science 65.28 32
Mary classrm_A english 86.01 16
John classrm_F science 96.55 25
Return rows for unique 'name' values where score' is greatest and 'class_size' is equal or greater than 25.
Desired output:
name group subject score class_size
Steve classrm_B science 65.28 32
Mary classrm_C science 77.25 26
John classrm_F science 96.55 25
Here is what I have attempted so far.....
min_class = df["class_size"] >= 25
df = df["min_class "]
df = df.groupby(['name']).max('score')
Any help would be greatly appreciated.
Upvotes: 2
Views: 1629
Reputation: 9197
Keep only rows with class size above 25 in dataframe, then sort dataframe by score, then drop all duplicates of column "name" and keep only the first row in case of duplicates.
df = df[df["class_size"] >= 25]
df = df.sort_values("score", ascending=False)
df = df.drop_duplicates(subset=["name"], keep="first")
Output:
Out[23]:
name group subject score class_size
5 John classrm_F science 96.55 25
2 Mary classrm_C science 77.25 26
3 Steve classrm_B science 65.28 32
Upvotes: 3
Reputation: 13377
This will do the trick for you. I sense it's slightly different approach, than you tried, but I dare say it's more "pythonish".
res = df.loc[df['class_size'].ge(25)].copy()
res['rank'] = res.groupby('name')['score'].rank(ascending=False)
res = res.loc[res['rank'].eq(1)].drop('rank', axis=1)
The key here is .rank()
function with .groupby()
- .rank(ascending=False)
will essentially return ranking in a descending order, .groupby()
will make you do the ranking for each group separately.
Upvotes: 1