Python: Aggregate the rows using the column values and delete one row for each key

Question

I am trying to find a way to remove all duplicated records from my DB.

For example, if I have this table (stored in a CSV file):

colA   colB
1      102
2      101
3      101
4      105
5      102
6      101

If we aggregate the table using a groupBy for the column colB, we have:

colB   count()
105    1
102    2
101    3

The final table I want to receive is:

colA   colB
1      102
2      101
3      101

The row with colB=105 is not present since we have only one row in the first table.
One row with colB=102 is presented since we have two rows in the first table.
Two rows with colB=101 are presented since we have three rows in the first table.

One more thing: it is not important which row is dropped.

Answers (1)