How to find duplicates in one column in pandas in python

Question

I have a dataframe as follows where I want to keep the first occurrence of the duplicate and remove the remaining duplicates.

For example, consider the below mentioned dataframe. We can see duplicates in title column such as nn nn, mm mm etc. I want to remove them by keeping only the first occurrence of it.

id title
12 nn nn
11 nn nn
10 nn nn
18 mm mm
19 nn nn
06 mm mm
08 ll ll
09 jj jj
26 ll ll

My output should look as follows:

id title
12 nn nn
18 mm mm
08 ll ll
09 jj jj

I tried the following pandas code:

L= input_data[["id","title"]]
L_new = L[~L.duplicated()]

However, it does not remove duplicates as I wanted.

I am happy to provide more details if needed.

Alex Fish · Accepted Answer

Try input_data.groupby('title').first().

How to find duplicates in one column in pandas in python

Answers (2)

Related Questions