SummerEla
SummerEla

Reputation: 1952

Get max of count() function on pandas groupby objects

Problem

Using pandas, I need to get back the row with the max count for each groupby object.

Dataset

I have a dataframe called "matches" that looks like this:

FeatureID gene pos 0 1_1_1 KRAS_1 6 1 1_1_1 KRAS_2 8 2 1_1_1 KRAS_3 11 3 1_1_1 NRAS_1 3 4 1_1_1 NRAS_2 11 5 1_1_1 NRAS_3 84 6 1_1_10 KRAS_1 4 7 1_1_10 KRAS_2 3 8 1_1_10 KRAS_3 14 9 1_1_10 NRAS_1 4 10 1_1_10 NRAS_2 6 11 1_1_10 NRAS_3 83

What I've tried

I need to group together the dataframe by FeatureID and then get the count of positions in each group:

matches.groupby(["FeatureID", "gene"]).count()

Which results in:

FeatureID gene 1_1_1 KRAS_1 6 KRAS_2 8 KRAS_3 11 NRAS_1 3 NRAS_2 11 NRAS_3 84 1_1_10 KRAS_1 4 KRAS_2 3 KRAS_3 14 NRAS_1 4 NRAS_2 6

Desired output:

I need to get back the row in each groupby object that contains the highest count, but I cannot figure out how to do that.

FeatureID gene count 1_1_1 NRAS_3 84 1_1_10 KRAS_3 14

Solution

The following line gives me back the gene with the max value for each groupby group:

matches.groupby(["FeatureID", "gene"]).count().sort_values("pos").groupby(level=0).tail(1)

Upvotes: 6

Views: 41515

Answers (1)

BENY
BENY

Reputation: 323266

You can do with max on level=0

matches.groupby(["FeatureID", "gene"]).count().max(level=0)

If keep both level

df.groupby(["FeatureID", "gene"]).count().sort_values().groupby(level=0).tail(1)

Upvotes: 10

Related Questions