Remove duplicate rows in Pandas (possibly by group)

Question

I have dataset, df, with the following data:

starttime               endtime              ID  Diff  
1/10/2020 9:05:00 PM    1/10/2020 9:05:10    A    10
1/10/2020 9:05:00 PM    1/10/2020 9:05:10    A    10
1/10/2020 9:06:00 PM    1/10/2020 9:06:10    B    10

Desired outcome:

starttime               endtime              ID Diff
1/10/2020 9:05:00 PM    1/10/2020 9:05:10    A  10
1/10/2020 9:06:00 PM    1/10/2020 9:06:10    B  10

If you notice, one of the rows from Group A was removed, because it was an exact duplicate:

1/10/2020 9:05:00 pm    1/10/2020 9:05:10    A   10

This is the code I am using, however, I am unsure as to what to include in the parentheses, or if this is correct:

df.drop_duplicates(subset=None, keep=False)

Any suggestions are appreciated.

Kenan · Accepted Answer

You can supply the column

df.drop_duplicates(subset='ID', keep=False)

Remove duplicate rows in Pandas (possibly by group)

Answers (2)

Related Questions