Reputation: 19329
When I drop John
as duplicate specifying 'name' as the column name:
import pandas as pd
data = {'name':['Bill','Steve','John','John','John'], 'age':[21,28,22,30,29]}
df = pd.DataFrame(data)
df = df.drop_duplicates('name')
pandas drops all matching entities leaving the left-most:
age name
0 21 Bill
1 28 Steve
2 22 John
Instead I would like to keep the row where John's age is the highest (in this example it is the age 30. How to achieve this?
Upvotes: 5
Views: 63
Reputation: 210852
Try this:
In [75]: df
Out[75]:
age name
0 21 Bill
1 28 Steve
2 22 John
3 30 John
4 29 John
In [76]: df.sort_values('age').drop_duplicates('name', keep='last')
Out[76]:
age name
0 21 Bill
1 28 Steve
3 30 John
or this depending on your goals:
In [77]: df.drop_duplicates('name', keep='last')
Out[77]:
age name
0 21 Bill
1 28 Steve
4 29 John
Upvotes: 4