Reputation: 47
I have a df like this:
Protein Peptide
A AAA
A AAA
A ABA
B AAA
B ABA
B ABA
But I need to filter my data by finding for each value in column 1 the top occurring value in column 2.
So the output would be like:
Protein Peptide
A AAA
B ABA
In reality I need even top 3 occuring values. Really don't know how to solve it using python and pandas?
Upvotes: 2
Views: 235
Reputation: 375485
mode isn't a groupby method, though it is a Series (and DataFrame) method, so you have to pass it to apply:
In [11]: df.groupby('Protein')['Peptide'].apply(lambda x: x.mode()[0])
Out[11]:
Protein
A AAA
B ABA
Name: Peptide, dtype: object
To get the top three, you could use value_counts
(in the same way):
In [12]: df.groupby('Protein')['Peptide'].apply(lambda x: x.value_counts()[:3])
Out[12]:
Protein
A AAA 2
ABA 1
B ABA 2
AAA 1
dtype: int64
Upvotes: 4