Tlaloc-ES
Tlaloc-ES

Reputation: 5302

How can i count the number of rows duplicated?

Hello I am working with a DF, and I have the following question:

How can I check the number of duplicates like these:

A    B    C
1    2    3
1    2    3
1    1    2
2    1    2
2    2    1
3    2    1

Who can I count for examples that A duplicate is 2 because I have 3 rows duplicated with 1 and 2 rows duplicated with 2.

And how could I count that A duplicate is 1 because there is only one time that 2 rows are identical as you can see in 1 2 3

Thanks

Upvotes: 2

Views: 115

Answers (3)

wwnde
wwnde

Reputation: 26686

I have understood you need duplicates per column. If so, use boolean selection to identify the first duplicate. cumsum() to get groups and get maximum in the group.

df.apply(lambda x: ((x==x.shift(-1))&(x.diff()!=0)).cumsum().max())

A    2
B    3
C    3

If you wanted duplicates along the rows, find duplicated, converst to integer and sum

((df.apply(lambda x: x.duplicated(False),axis=1)).astype(int)).sum(axis=1)
0    0
1    0
2    4
3    3
4    3
5    0

For your second part of the question, do what @Cody Gray did as follows

df.groupby(['A', 'B', 'C']).agg(lambda x: x.duplicated(keep='last').count())

A  B  C
1  1  2    1
   2  3    2
2  1  2    1
   2  1    1
3  2  1    1

Upvotes: 2

Ahx
Ahx

Reputation: 8015

I think this will help you to solve your problem

from pandas import DataFrame

if __name__ == '__main__':
    d = {'A': [1, 1, 1, 2, 2, 3],
         'B': [2, 2, 1, 1, 2, 2],
         'C': [3, 3, 2, 2, 1, 1]}

    df = DataFrame(d)
    duplicated_rows = df[df.duplicated()]
    print(duplicated_rows)

Output:

   A  B  C
1  1  2  3

Upvotes: 2

sunakshi132
sunakshi132

Reputation: 87

df.groupby(['A','B','C']).size()

Upvotes: 3

Related Questions