Reputation: 8247
I have following dataframe in pandas
employee_name age location salary
Harish 31 Mumbai 450000
Marina 30 Mumbai 600000
Meena 31 Pune 750000
Sachin 32 Mumbai 1200000
Tarun 27 Mumbai 1400000
Mahesh 41 Pune 1500000
Satish 42 Delhi 650000
Heena 34 Delhi 800000
What I want out of this dataframe is all the employees in the age group > 30 & < 35 in all different locations earning maximum salary
My desired dataframe would be
employee_name age location salary
Sachin 32 Mumbai 1200000
Meena 31 Pune 750000
Heena 34 Delhi 800000
I am doing following in pandas, but its giving an error
df.groupby('location').filter(lambda x : (x['age'] > 30) & (x['age'] < 35))['salary'].max()
How do I do it in pandas?
Upvotes: 0
Views: 125
Reputation: 3653
You can try this option:
df = df.query('age > 30 & age < 35')
df = df.drop_duplicates(subset="age", keep="last")
print(df)
Upvotes: 0
Reputation: 323266
Try to use idxmax
, notice filter here will not work
df.loc[df[df['age'].between(31,34)].groupby('location')['salary'].idxmax()]
Out[110]:
employee_name age location salary
7 Heena 34 Delhi 800000
3 Sachin 32 Mumbai 1200000
2 Meena 31 Pune 750000
Upvotes: 1
Reputation: 150745
You can just filter first, then find the rows with max values:
(df.loc[df['age'].between(31,34)]
.sort_values('salary')
.drop_duplicates('location', keep='last')
)
Output:
employee_name age location salary
2 Meena 31 Pune 750000
7 Heena 34 Delhi 800000
3 Sachin 32 Mumbai 1200000
Upvotes: 2