Reputation: 81
Is there a way to conditionally drop duplicates (using drop_duplicates specifically) in a pandas dataframe w/about 10 columns and 400,000 rows? That is, I want to keep all rows that have 2 columns meet a condition: if the combination of date (column) and store (column) # are unique, keep row, other wise, drop.
Upvotes: 8
Views: 2003
Reputation: 76917
Use drop_duplicates
to return dataframe with duplicate rows removed, optionally only considering certain columns
Let initial dataframe be like
In [34]: df
Out[34]:
Col1 Col2 Col3
0 A B 10
1 A B 20
2 A C 20
3 C B 20
4 A B 20
If you want to take unique combinations from certain columns 'Col1', 'Col2'
In [35]: df.drop_duplicates(['Col1', 'Col2'])
Out[35]:
Col1 Col2 Col3
0 A B 10
2 A C 20
3 C B 20
If you want to take unique combinations of all columns
In [36]: df.drop_duplicates()
Out[36]:
Col1 Col2 Col3
0 A B 10
1 A B 20
2 A C 20
3 C B 20
Upvotes: 6