Reputation: 579
I have a DataFrame in python pandas which contains several different entries (rows) having also integer values in columns, for example:
A B C D E F G H
0 1 2 1 0 1 2 1 2
1 0 1 1 1 1 2 1 2
2 1 2 1 2 1 2 1 3
3 0 1 1 1 1 2 1 2
4 2 2 1 2 1 2 1 3
I would return just the rows which contain common values in columns, the result should be:
A B C D E F G H
1 0 1 1 1 1 2 1 2
3 0 1 1 1 1 2 1 2
Thanks in advance
Upvotes: 1
Views: 88
Reputation: 862521
Need duplicated
with parameter keep=False
for return all duplicates with boolean indexing
:
print (df.duplicated(keep=False))
0 False
1 True
2 False
3 True
4 False
dtype: bool
df = df[df.duplicated(keep=False)]
print (df)
A B C D E F G H
1 0 1 1 1 1 2 1 2
3 0 1 1 1 1 2 1 2
Also if need remove first or last duplicates rows use:
df1 = df[df.duplicated()]
#same as 'first', default parameter, so an be omit
#df1 = df[df.duplicated(keep='first')]
print (df1)
A B C D E F G H
3 0 1 1 1 1 2 1 2
df2 = df[df.duplicated(keep='last')]
print (df2)
A B C D E F G H
1 0 1 1 1 1 2 1 2
Upvotes: 1
Reputation: 393973
You can use the boolean mask from duplicated
passing param keep=False
:
In [3]:
df[df.duplicated(keep=False)]
Out[3]:
A B C D E F G H
1 0 1 1 1 1 2 1 2
3 0 1 1 1 1 2 1 2
Here is the mask showing the rows that are duplicates, passing keep=False
returns all duplicate rows, by default it would return the first duplicate row:
In [4]:
df.duplicated(keep=False)
Out[4]:
0 False
1 True
2 False
3 True
4 False
dtype: bool
Upvotes: 2