M_S_N
M_S_N

Reputation: 2810

pandas-Max values from Multi-index DataFrame

I have a multi-index dataframe as shown below.

+-------+----+------+
|       |    | %age |
+-------+----+------+
| Group | Id |      |
| 0     | 18 | 75   |
| 0     | 30 | 12   |
| 0     | 42 | 13   |
| 1     | 18 | 12   |
| 1     | 30 | 75   |
| 1     | 42 | 13   |
| 2     | 18 | 13   |
| 2     | 30 | 12   |
| 2     | 42 | 75   |
+-------+----+------+

I want to get the max values from each of the group, but as opposed to many other questions on SO, also want to show all the index-level columns. like this:

+-------+----+------+
|       |    | %age |
| Group | Id |      |
| 0     | 18 | 75   |
| 1     | 30 | 75   |
| 2     | 42 | 75   |
+-------+----+------+


I tried this answer from here

df1 = df_pct.reset_index(level=1, drop=True)
mask = df1.index.isin(df1.groupby(level=[0])['%age'].idxmax())
df_pct[mask]

But I am getting an output that displays the whole dataframe

Upvotes: 2

Views: 271

Answers (1)

jezrael
jezrael

Reputation: 862701

For me working simplier solution - reset_index and isin should be omit and added loc for select by MultiIndex values:

df = df_pct.loc[df_pct.groupby(level=[0])['%age'].idxmax()]
print (df)
          %age
Group Id      
0     18    75
1     30    75
2     42    75

Detail:

print (df_pct.groupby(level=[0])['%age'].idxmax())
Group
0    (0, 18)
1    (1, 30)
2    (2, 42)
Name: %age, dtype: object

EDIT:

For top N with MultiIndex is possible use DataFrame.sort_values with GroupBy.head:

N = 2
df1 = (df_pct.sort_values(['Group','%age'], ascending=[True, False])
             .groupby(level=[0])['%age']
             .head(N))

print (df1)
Group  Id
0      18    75
       42    13
1      30    75
       42    13
2      42    75
       18    13
Name: %age, dtype: int64

Upvotes: 4

Related Questions