Reputation: 2810
I have a multi-index dataframe as shown below.
+-------+----+------+
| | | %age |
+-------+----+------+
| Group | Id | |
| 0 | 18 | 75 |
| 0 | 30 | 12 |
| 0 | 42 | 13 |
| 1 | 18 | 12 |
| 1 | 30 | 75 |
| 1 | 42 | 13 |
| 2 | 18 | 13 |
| 2 | 30 | 12 |
| 2 | 42 | 75 |
+-------+----+------+
I want to get the max values from each of the group, but as opposed to many other questions on SO, also want to show all the index-level columns. like this:
+-------+----+------+
| | | %age |
| Group | Id | |
| 0 | 18 | 75 |
| 1 | 30 | 75 |
| 2 | 42 | 75 |
+-------+----+------+
I tried this answer from here
df1 = df_pct.reset_index(level=1, drop=True)
mask = df1.index.isin(df1.groupby(level=[0])['%age'].idxmax())
df_pct[mask]
But I am getting an output that displays the whole dataframe
Upvotes: 2
Views: 271
Reputation: 862701
For me working simplier solution - reset_index
and isin
should be omit and added loc
for select by MultiIndex
values:
df = df_pct.loc[df_pct.groupby(level=[0])['%age'].idxmax()]
print (df)
%age
Group Id
0 18 75
1 30 75
2 42 75
Detail:
print (df_pct.groupby(level=[0])['%age'].idxmax())
Group
0 (0, 18)
1 (1, 30)
2 (2, 42)
Name: %age, dtype: object
EDIT:
For top N with MultiIndex
is possible use DataFrame.sort_values
with GroupBy.head
:
N = 2
df1 = (df_pct.sort_values(['Group','%age'], ascending=[True, False])
.groupby(level=[0])['%age']
.head(N))
print (df1)
Group Id
0 18 75
42 13
1 30 75
42 13
2 42 75
18 13
Name: %age, dtype: int64
Upvotes: 4