Michael
Michael

Reputation: 13914

Selecting unique observations in a pandas data frame

I have a pandas data frame with a column uniqueid. I would like to remove all duplicates from the data frame based on this column, such that all remaining observations are unique.

Upvotes: 7

Views: 15067

Answers (2)

cwharland
cwharland

Reputation: 6703

There is also the drop_duplicates() method for any data frame (docs here). You can pass specific columns to drop from as an argument.

df.drop_duplicates(subset='uniqueid', inplace=True)

Upvotes: 12

TomAugspurger
TomAugspurger

Reputation: 28946

Use the duplicated method

Since we only care if uniqueid (A in my example) is duplicated, select that and call duplicated on that series. Then use the ~ to flip the bools.

In [90]: df = pd.DataFrame({'A': ['a', 'b', 'b', 'c'], 'B': [1, 2, 3, 4]})

In [91]: df
Out[91]: 
   A  B
0  a  1
1  b  2
2  b  3
3  c  4

In [92]: df['A'].duplicated()
Out[92]: 
0    False
1    False
2     True
3    False
Name: A, dtype: bool

In [93]: df.loc[~df['A'].duplicated()]
Out[93]: 
   A  B
0  a  1
1  b  2
3  c  4

Upvotes: 10

Related Questions