How do I select pandas rows based on duplicate column values?

Question

If I have a pandas df which looks like this:

+--------+-----------+--------
|Col1    | Col2      |Col3      |
|--------+-----------+----------+
|75      |  84       |    A     | 
|84      |   68      |    B     |                
|75      |   84      |    C     |
|75      |   84      |    A     |
+--------+-----------+----------+

I want the output to be

+--------+-----------+--------
|Col1    | Col2      |Col3      |
|--------+-----------+----------+
|75      |  84       |    A     | 
|75      |  84       |    C     |

i.e. wherever the values of Col1 and Col 2 are the same but Col 3 is different. I have tried

df[df.duplicated(['ID'], keep=False)]

But this does not identify duplicates based on only 2 column similarity.

jezrael · Accepted Answer

First get all duplicates by Col1 and Col2 and then remove duplicates per all columns by DataFrame.drop_duplicates:

df = df[df.duplicated(['Col1', 'Col2'], keep=False)].drop_duplicates()
print (df)
   Col1  Col2 Col3
0    75    84    A
2    75    84    C

How do I select pandas rows based on duplicate column values?

Answers (2)

Related Questions