Reputation: 5302
Hello I am working with a DF, and I have the following question:
How can I check the number of duplicates like these:
A B C
1 2 3
1 2 3
1 1 2
2 1 2
2 2 1
3 2 1
Who can I count for examples that A duplicate is 2 because I have 3 rows duplicated with 1 and 2 rows duplicated with 2.
And how could I count that A duplicate is 1 because there is only one time that 2 rows are identical as you can see in 1 2 3
Thanks
Upvotes: 2
Views: 115
Reputation: 26686
I have understood you need duplicates per column. If so, use boolean selection to identify the first duplicate. cumsum() to get groups and get maximum in the group.
df.apply(lambda x: ((x==x.shift(-1))&(x.diff()!=0)).cumsum().max())
A 2
B 3
C 3
If you wanted duplicates along the rows, find duplicated, converst to integer and sum
((df.apply(lambda x: x.duplicated(False),axis=1)).astype(int)).sum(axis=1)
0 0
1 0
2 4
3 3
4 3
5 0
For your second part of the question, do what @Cody Gray did as follows
df.groupby(['A', 'B', 'C']).agg(lambda x: x.duplicated(keep='last').count())
A B C
1 1 2 1
2 3 2
2 1 2 1
2 1 1
3 2 1 1
Upvotes: 2
Reputation: 8015
I think this will help you to solve your problem
from pandas import DataFrame
if __name__ == '__main__':
d = {'A': [1, 1, 1, 2, 2, 3],
'B': [2, 2, 1, 1, 2, 2],
'C': [3, 3, 2, 2, 1, 1]}
df = DataFrame(d)
duplicated_rows = df[df.duplicated()]
print(duplicated_rows)
Output:
A B C
1 1 2 3
Upvotes: 2