valaripeta
valaripeta

Reputation: 3

How do I select pandas rows based on duplicate column values?

If I have a pandas df which looks like this:

+--------+-----------+--------
|Col1    | Col2      |Col3      |
|--------+-----------+----------+
|75      |  84       |    A     | 
|84      |   68      |    B     |                
|75      |   84      |    C     |
|75      |   84      |    A     |
+--------+-----------+----------+

I want the output to be

+--------+-----------+--------
|Col1    | Col2      |Col3      |
|--------+-----------+----------+
|75      |  84       |    A     | 
|75      |  84       |    C     |  

i.e. wherever the values of Col1 and Col 2 are the same but Col 3 is different. I have tried

df[df.duplicated(['ID'], keep=False)]

But this does not identify duplicates based on only 2 column similarity.

Upvotes: 0

Views: 63

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34056

In [288]: df[df.duplicated(['Col1', 'Col2'], keep=False)].drop_duplicates()
Out[288]: 
   Col1  Col2 Col3
0    75    84    A
2    75    84    C

Upvotes: 1

jezrael
jezrael

Reputation: 862671

First get all duplicates by Col1 and Col2 and then remove duplicates per all columns by DataFrame.drop_duplicates:

df = df[df.duplicated(['Col1', 'Col2'], keep=False)].drop_duplicates()
print (df)
   Col1  Col2 Col3
0    75    84    A
2    75    84    C

Upvotes: 1

Related Questions