Jvn
Jvn

Reputation: 495

pandas: how to select first or last by column in keep with drop_duplicates

As shown below, name must be keep in fisrt and team in last.

How can I accomplish this with .drop_duplicates() or otherwise?

   name  team ...
0  john  a    ...
1  mike  b    ...
2  john  c

↓

   name  team ...
0  john  c    ...
1  mike  b    ...

-- Additional note about comments --

.groupby('name').agg({'team': 'last', 'country': 'first'})

The way it works now, if the first line of country is Nan If the first line of country is Nan, a value that is not the first will be obtained as follows.

Is this because the case of Nan is ignored? Even if first is specified and first is Nan, Nan must still be retained.

   name  team  country ...
0  john   a    Nan     ...
1  mike  b     Brazil  ...
2  john  c     Canada  ...

↓

   name  team  country ...
0  john  c     Canada  ...
1  mike  b     Brazil  ...

Upvotes: 0

Views: 57

Answers (1)

Rik Kraan
Rik Kraan

Reputation: 586

You can use the .groupby() function:

df.groupby('name').agg({'team': 'last'}).

Be aware that in the value that's returned per name is dependent on the sorting of your dataframe.

Upvotes: 1

Related Questions