EmJ
EmJ

Reputation: 4608

How to find duplicates in one column in pandas in python

I have a dataframe as follows where I want to keep the first occurrence of the duplicate and remove the remaining duplicates.

For example, consider the below mentioned dataframe. We can see duplicates in title column such as nn nn, mm mm etc. I want to remove them by keeping only the first occurrence of it.

id title
12 nn nn
11 nn nn
10 nn nn
18 mm mm
19 nn nn
06 mm mm
08 ll ll
09 jj jj
26 ll ll 

My output should look as follows:

id title
12 nn nn
18 mm mm
08 ll ll
09 jj jj

I tried the following pandas code:

L= input_data[["id","title"]]
L_new = L[~L.duplicated()]

However, it does not remove duplicates as I wanted.

I am happy to provide more details if needed.

Upvotes: 0

Views: 73

Answers (2)

BENY
BENY

Reputation: 323226

We can using head

df.groupby('title').head(1)
   id  title
0  12  nn nn
3  18  mm mm
6   8  ll ll
7   9  jj jj

Upvotes: 1

Alex Fish
Alex Fish

Reputation: 778

Try input_data.groupby('title').first().

Upvotes: 1

Related Questions