Maku
Maku

Reputation: 47

Grouping and finding most frequent values

I have a df like this:

Protein Peptide
A        AAA
A        AAA
A        ABA
B        AAA
B        ABA
B        ABA

But I need to filter my data by finding for each value in column 1 the top occurring value in column 2.

So the output would be like:

Protein Peptide
A        AAA
B        ABA

In reality I need even top 3 occuring values. Really don't know how to solve it using python and pandas?

Upvotes: 2

Views: 235

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375485

mode isn't a groupby method, though it is a Series (and DataFrame) method, so you have to pass it to apply:

In [11]: df.groupby('Protein')['Peptide'].apply(lambda x: x.mode()[0])
Out[11]:
Protein
A    AAA
B    ABA
Name: Peptide, dtype: object

To get the top three, you could use value_counts (in the same way):

In [12]: df.groupby('Protein')['Peptide'].apply(lambda x: x.value_counts()[:3])
Out[12]:
Protein
A        AAA    2
         ABA    1
B        ABA    2
         AAA    1
dtype: int64

Upvotes: 4

Related Questions