Reputation: 2189
I have a data frame like this,
df
col1 col2
[1,2,3] [4,5]
[1,2,3] [6,7]
[4,5,6] [8,9]
[9,8,7,1] [1,2]
[9,8,7,1] [3,4]
Now I want to remove duplicates from col1, and keep the first row of duplicate values so the data frame would look like,
col1 col2
[1,2,3] [4,5]
[4,5,6] [8,9]
[9,8,7,1] [1,2]
As .drop_duplicates() not working here looking for some pandas solutions to do this more efficiently other than using a for loop.
Upvotes: 1
Views: 379
Reputation: 71689
We can try mapping the lists in col1
to tuple
, then we can use duplicated
to create a boolean mask which can be used to filter the rows
df[~df['col1'].map(tuple).duplicated()]
col1 col2
0 [1, 2, 3] [4,5]
2 [4, 5, 6] [8,9]
3 [9, 8, 7, 1] [1,2]
PS: For drop_duplicates
to work the values in the column must be hashable
or in other words immutable
.
Upvotes: 3