Morgan Sacco
Morgan Sacco

Reputation: 81

Conditional removing of duplicates pandas python

Is there a way to conditionally drop duplicates (using drop_duplicates specifically) in a pandas dataframe w/about 10 columns and 400,000 rows? That is, I want to keep all rows that have 2 columns meet a condition: if the combination of date (column) and store (column) # are unique, keep row, other wise, drop.

Upvotes: 8

Views: 2003

Answers (1)

Zero
Zero

Reputation: 76917

Use drop_duplicates to return dataframe with duplicate rows removed, optionally only considering certain columns

Let initial dataframe be like

In [34]: df
Out[34]:
  Col1 Col2  Col3
0    A    B    10
1    A    B    20
2    A    C    20
3    C    B    20
4    A    B    20

If you want to take unique combinations from certain columns 'Col1', 'Col2'

In [35]: df.drop_duplicates(['Col1', 'Col2'])
Out[35]:
  Col1 Col2  Col3
0    A    B    10
2    A    C    20
3    C    B    20

If you want to take unique combinations of all columns

In [36]: df.drop_duplicates()
Out[36]:
  Col1 Col2  Col3
0    A    B    10
1    A    B    20
2    A    C    20
3    C    B    20

Upvotes: 6

Related Questions