How to drop duplicate from DataFrame taking into account value of another column

Question

When I drop John as duplicate specifying 'name' as the column name:

import pandas as pd   
data = {'name':['Bill','Steve','John','John','John'], 'age':[21,28,22,30,29]}
df = pd.DataFrame(data)
df = df.drop_duplicates('name')

pandas drops all matching entities leaving the left-most:

   age   name
0   21   Bill
1   28  Steve
2   22   John

Instead I would like to keep the row where John's age is the highest (in this example it is the age 30. How to achieve this?

MaxU - stand with Ukraine · Accepted Answer

Try this:

In [75]: df
Out[75]:
   age   name
0   21   Bill
1   28  Steve
2   22   John
3   30   John
4   29   John

In [76]: df.sort_values('age').drop_duplicates('name', keep='last')
Out[76]:
   age   name
0   21   Bill
1   28  Steve
3   30   John

or this depending on your goals:

In [77]: df.drop_duplicates('name', keep='last')
Out[77]:
   age   name
0   21   Bill
1   28  Steve
4   29   John

How to drop duplicate from DataFrame taking into account value of another column

Answers (1)

Related Questions