Reputation: 4608
I have a dataframe as follows where I want to keep the first occurrence of the duplicate and remove the remaining duplicates.
For example, consider the below mentioned dataframe. We can see duplicates in title
column such as nn nn
, mm mm
etc. I want to remove them by keeping only the first occurrence of it.
id title
12 nn nn
11 nn nn
10 nn nn
18 mm mm
19 nn nn
06 mm mm
08 ll ll
09 jj jj
26 ll ll
My output should look as follows:
id title
12 nn nn
18 mm mm
08 ll ll
09 jj jj
I tried the following pandas code:
L= input_data[["id","title"]]
L_new = L[~L.duplicated()]
However, it does not remove duplicates as I wanted.
I am happy to provide more details if needed.
Upvotes: 0
Views: 73
Reputation: 323226
We can using head
df.groupby('title').head(1)
id title
0 12 nn nn
3 18 mm mm
6 8 ll ll
7 9 jj jj
Upvotes: 1