vt89
vt89

Reputation: 151

Python: Aggregate the rows using the column values and delete one row for each key

I am trying to find a way to remove all duplicated records from my DB.

For example, if I have this table (stored in a CSV file):

colA   colB
1      102
2      101
3      101
4      105
5      102
6      101

If we aggregate the table using a groupBy for the column colB, we have:

colB   count()
105    1
102    2
101    3

The final table I want to receive is:

colA   colB
1      102
2      101
3      101

One more thing: it is not important which row is dropped.

Upvotes: 2

Views: 192

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use, Series.duplicated along with optional parameter keep=last:

m = df['colB'].duplicated(keep='last')
df = df[m]

# print(df)

   colA  colB
0     1   102
1     2   101
2     3   101

Upvotes: 2

Related Questions