getglad
getglad

Reputation: 2563

Get index of the minimum of multi-index Pandas DataFrame using level

I have a Pandas DataFrame that is multiindexed and want to find the minimum value of a certain column in a subset of rows on each level, and get the entire contents of those rows.

import pandas as pd

idx = pd.MultiIndex.from_product([['v1', 'v2'],
                                  ['record' + str(i) for i in range(1, 7)]])

df = pd.DataFrame([[2., 114], [2., 1140],
                   [3., 114], [3., 1140],
                   [5., 114], [5., 1140],
                   [2., 114], [2., 1140],
                   [3., 114], [3., 1140],
                   [5., 114], [5., 1140]],
                  columns=['col1', 'col2'],
                  index=idx)

My structure:

                 col1  col2
level1 level2
v1     record1    2.0   114
       record2    2.0  1140
       record3    3.0   114
       record4    3.0  1140
       record5    5.0   114
       record6    5.0  1140
v2     record1    2.0   114
       record2    2.0  1140
       record3    3.0   114
       record4    3.0  1140
       record5    5.0   114
       record6    5.0  1140

Example desired output I want the minimum value of another column where col1 == 5:

                 col1  col2
level1 level2
v1     record5    5.0   114
v2     record5    5.0   114

I know that I can get a subset of rows by using a comparison statement.

df.ix[df['col1'] == 5]

And I also know that I can get the minimum values of a column within that subset from all levels.

df['col2'][df['col1'] == 5].min(level='level1')

And if I want to specify the level, then I can get the index of 1 row on specific level.

df.ix['v1', pay_up_file.ix['v1']['col2'][(df.ix['v1']['col1'] == 5)].idxmin()]

But I cannot figure out if there is an efficient way to get the indexes from all levels

There does not seem to be a method available along the lines of this:

df['col2'][df['col1'] == 5].idxmin(level='level1')

I can get to what I want with this:

df.ix[
  (df['col1'] == 5) & 
  (df['col2'].isin(df['col2'][df['col1'] == 5].min(level='level1').values))
]

But with everything else that is in Pandas, is there a better way to get to my output?

Upvotes: 7

Views: 5860

Answers (2)

Alexander
Alexander

Reputation: 109756

>>> (df[df.col1 == 5]
     .groupby(level=0, as_index=False).col2
     .apply(lambda group: group.nsmallest(1))
0  v1  record5    114
1  v2  record5    114
dtype: int64

Or...

>>> df[df.col1 == 5].groupby(level=0).col2.nsmallest(1)
v1  v1  record5    114
v2  v2  record5    114
dtype: int64

But I'm not sure why the first level shows twice (i.e. 'v1' 'v1' ...).

Upvotes: 1

piRSquared
piRSquared

Reputation: 294586

This should work:

df.loc[df.loc[df.col1 == 5.].groupby(level=0).col2.idxmin()]

            col1  col2
v1 record5   5.0   114
v2 record5   5.0   114

Note

I'm using idxmin as you thought you ought to. But the context matters. I'm using it following a groupby(level=0).col2.idxmin() which acts as you thought col2.idxmin(level=...) should.

Upvotes: 6

Related Questions