Reputation: 3
If I have a pandas df which looks like this:
+--------+-----------+--------
|Col1 | Col2 |Col3 |
|--------+-----------+----------+
|75 | 84 | A |
|84 | 68 | B |
|75 | 84 | C |
|75 | 84 | A |
+--------+-----------+----------+
I want the output to be
+--------+-----------+--------
|Col1 | Col2 |Col3 |
|--------+-----------+----------+
|75 | 84 | A |
|75 | 84 | C |
i.e. wherever the values of Col1 and Col 2 are the same but Col 3 is different. I have tried
df[df.duplicated(['ID'], keep=False)]
But this does not identify duplicates based on only 2 column similarity.
Upvotes: 0
Views: 63
Reputation: 34056
In [288]: df[df.duplicated(['Col1', 'Col2'], keep=False)].drop_duplicates()
Out[288]:
Col1 Col2 Col3
0 75 84 A
2 75 84 C
Upvotes: 1
Reputation: 862671
First get all duplicates by Col1
and Col2
and then remove duplicates per all columns by DataFrame.drop_duplicates
:
df = df[df.duplicated(['Col1', 'Col2'], keep=False)].drop_duplicates()
print (df)
Col1 Col2 Col3
0 75 84 A
2 75 84 C
Upvotes: 1