Kallol
Kallol

Reputation: 2189

Remove duplicates from list type pandas column

I have a data frame like this,

df
col1        col2
[1,2,3]     [4,5]
[1,2,3]     [6,7]
[4,5,6]     [8,9]
[9,8,7,1]   [1,2]
[9,8,7,1]   [3,4]

Now I want to remove duplicates from col1, and keep the first row of duplicate values so the data frame would look like,

col1        col2
[1,2,3]     [4,5]
[4,5,6]     [8,9]
[9,8,7,1]   [1,2]

As .drop_duplicates() not working here looking for some pandas solutions to do this more efficiently other than using a for loop.

Upvotes: 1

Views: 379

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

We can try mapping the lists in col1 to tuple, then we can use duplicated to create a boolean mask which can be used to filter the rows

df[~df['col1'].map(tuple).duplicated()]

           col1   col2
0     [1, 2, 3]  [4,5]
2     [4, 5, 6]  [8,9]
3  [9, 8, 7, 1]  [1,2]

PS: For drop_duplicates to work the values in the column must be hashable or in other words immutable.

Upvotes: 3

Related Questions