Reputation: 151
I am trying to find a way to remove all duplicated records from my DB.
For example, if I have this table (stored in a CSV file):
colA colB
1 102
2 101
3 101
4 105
5 102
6 101
If we aggregate the table using a groupBy for the column colB, we have:
colB count()
105 1
102 2
101 3
The final table I want to receive is:
colA colB
1 102
2 101
3 101
One more thing: it is not important which row is dropped.
Upvotes: 2
Views: 192
Reputation: 71689
Use, Series.duplicated
along with optional parameter keep=last
:
m = df['colB'].duplicated(keep='last')
df = df[m]
# print(df)
colA colB
0 1 102
1 2 101
2 3 101
Upvotes: 2