hk2
hk2

Reputation: 487

Find the highest value within multiple groups in python

I'm trying to find the highest salary within each decade for every given name. Below is my sample dataset and the line of code

| Bins  | Name | Salary |
|-------|------|--------|
| 1990s | Ron  | 8000   |
| 1990s | Ron  | 8500   |
| 2000s | Hary | 7000   |
| 1980s | Ron  | 6800   |
| 2010s | John | 10000  |
| 2010s | John | 21000  |

df.sort_values(by='Salary', ascending=False).groupby('Bins').reset_index()

But I keep getting error: "need to rewrap column in Series to apply key function"

My desired output should look like:

| Bins  | Name | Salary |
|-------|------|--------|
| 1980s | Ron  | 6800   |
| 1990s | Ron  | 8500   |
| 2000s | Hary | 7000   |
| 2010s | John | 21000  |

Upvotes: 0

Views: 411

Answers (2)

SeaBean
SeaBean

Reputation: 23217

You can group by ['Bins', 'Name'] and use GroupBy.first() to get the highest value within the group (since you have sorted the values in descending order, the first entry within the group is the one with the highest value):

df.sort_values(by='Salary', ascending=False).groupby(['Bins', 'Name']).first().reset_index()

Result:

    Bins  Name  Salary
0  1980s   Ron    6800
1  1990s   Ron    8500
2  2000s  Hary    7000
3  2010s  John   21000

Another way to do it is to group by ['Bins', 'Name'] and use GroupBy.max()

df.groupby(['Bins', 'Name'], as_index=False)['Salary'].max()

Same result.

Upvotes: 0

eduardokapp
eduardokapp

Reputation: 1751

This should work:

df.groupby('Bins').max().reset_index()

Upvotes: 1

Related Questions