Reputation: 13914
I have a pandas
data frame with a column uniqueid
. I would like to remove all duplicates from the data frame based on this column, such that all remaining observations are unique.
Upvotes: 7
Views: 15067
Reputation: 6703
There is also the drop_duplicates()
method for any data frame (docs here). You can pass specific columns to drop from as an argument.
df.drop_duplicates(subset='uniqueid', inplace=True)
Upvotes: 12
Reputation: 28946
Use the duplicated
method
Since we only care if uniqueid
(A
in my example) is duplicated, select that and call duplicated
on that series. Then use the ~
to flip the bools.
In [90]: df = pd.DataFrame({'A': ['a', 'b', 'b', 'c'], 'B': [1, 2, 3, 4]})
In [91]: df
Out[91]:
A B
0 a 1
1 b 2
2 b 3
3 c 4
In [92]: df['A'].duplicated()
Out[92]:
0 False
1 False
2 True
3 False
Name: A, dtype: bool
In [93]: df.loc[~df['A'].duplicated()]
Out[93]:
A B
0 a 1
1 b 2
3 c 4
Upvotes: 10