Neil
Neil

Reputation: 8247

Pandas groupby, filter and aggregate

I have following dataframe in pandas

employee_name   age location    salary
Harish          31  Mumbai      450000
Marina          30  Mumbai      600000
Meena           31  Pune        750000
Sachin          32  Mumbai      1200000
Tarun           27  Mumbai      1400000
Mahesh          41  Pune        1500000
Satish          42  Delhi       650000
Heena           34  Delhi       800000

What I want out of this dataframe is all the employees in the age group > 30 & < 35 in all different locations earning maximum salary

My desired dataframe would be

employee_name       age     location     salary
Sachin              32      Mumbai       1200000
Meena               31      Pune         750000
Heena               34      Delhi        800000

I am doing following in pandas, but its giving an error

df.groupby('location').filter(lambda x : (x['age'] > 30) & (x['age'] < 35))['salary'].max()

How do I do it in pandas?

Upvotes: 0

Views: 125

Answers (3)

Soumendra Mishra
Soumendra Mishra

Reputation: 3653

You can try this option:

df = df.query('age > 30 & age < 35')
df = df.drop_duplicates(subset="age", keep="last")
print(df)

Upvotes: 0

BENY
BENY

Reputation: 323266

Try to use idxmax, notice filter here will not work

df.loc[df[df['age'].between(31,34)].groupby('location')['salary'].idxmax()]
Out[110]: 
  employee_name  age location   salary
7         Heena   34    Delhi   800000
3        Sachin   32   Mumbai  1200000
2         Meena   31     Pune   750000

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150745

You can just filter first, then find the rows with max values:

(df.loc[df['age'].between(31,34)]
   .sort_values('salary')
   .drop_duplicates('location', keep='last')
)

Output:

  employee_name  age location   salary
2         Meena   31     Pune   750000
7         Heena   34    Delhi   800000
3        Sachin   32   Mumbai  1200000

Upvotes: 2

Related Questions