Reputation: 1952
Using pandas, I need to get back the row with the max count for each groupby object.
I have a dataframe called "matches" that looks like this:
FeatureID gene pos
0 1_1_1 KRAS_1 6
1 1_1_1 KRAS_2 8
2 1_1_1 KRAS_3 11
3 1_1_1 NRAS_1 3
4 1_1_1 NRAS_2 11
5 1_1_1 NRAS_3 84
6 1_1_10 KRAS_1 4
7 1_1_10 KRAS_2 3
8 1_1_10 KRAS_3 14
9 1_1_10 NRAS_1 4
10 1_1_10 NRAS_2 6
11 1_1_10 NRAS_3 83
I need to group together the dataframe by FeatureID and then get the count of positions in each group:
matches.groupby(["FeatureID", "gene"]).count()
Which results in:
FeatureID gene
1_1_1 KRAS_1 6
KRAS_2 8
KRAS_3 11
NRAS_1 3
NRAS_2 11
NRAS_3 84
1_1_10 KRAS_1 4
KRAS_2 3
KRAS_3 14
NRAS_1 4
NRAS_2 6
I need to get back the row in each groupby object that contains the highest count, but I cannot figure out how to do that.
FeatureID gene count
1_1_1 NRAS_3 84
1_1_10 KRAS_3 14
The following line gives me back the gene with the max value for each groupby group:
matches.groupby(["FeatureID", "gene"]).count().sort_values("pos").groupby(level=0).tail(1)
Upvotes: 6
Views: 41515
Reputation: 323266
You can do with max
on level=0
matches.groupby(["FeatureID", "gene"]).count().max(level=0)
If keep both level
df.groupby(["FeatureID", "gene"]).count().sort_values().groupby(level=0).tail(1)
Upvotes: 10