Steve Steveman Man
Steve Steveman Man

Reputation: 33

Counting the instances of a duplicate value after using groupby for a column

I'm working on a dataset that looks like this:

col1
person1  gene1
person1  gene1
person1  gene2
person1  gene3
person1  gene4
person2  gene1
person2  gene2
person2  gene3
person2  gene4
person3  gene1

person3  gene1
person3  gene1
person3  gene2
person3  gene3
person3  gene3
person3  gene4

For each person, I want to count the number of times a gene appears more than once.

For example, in the case I presented above, person1 has gene1 duplicated, person2 has no genes duplicated, and person3 has gene1 and gene3 duplicated. Thus, I would want my code to output 3.

I know that there is a duplicated pandas code: DataFrame.duplicated(subset=None, keep='first')

However, trying to apply it to my dataframe, I keep getting told I need to apply it?

Thanks

I added a clarification for additional help:

person1 gene1
person1 gene1
person1 gene2
person1 gene2
person2 gene1
person2 gene1
person3 gene1
person3 gene1
person3 gene2
person3 gene2
person3 gene2

Upvotes: 1

Views: 31

Answers (1)

BENY
BENY

Reputation: 323266

You can do with size

df.groupby([*df.columns]).size().gt(1).sum()
Out[37]: 3

Upvotes: 1

Related Questions