Djiggy
Djiggy

Reputation: 235

finding occurences of Dataframe column of max value in multi-index case

I have a set of data for which I am trying to assess the influence of each parameter. To do so, my first idea is to try and compute the probability of my parameter value yielding the best outcome when locking all other parameters, or more generally to be in the best x%. Let's look at an example to make it clearer :

My data looks like this (but with more levels):

import pandas as pd
import numpy as np

iterables = [['a','b','c'], [1,2,3]]
np.random.seed(123)

columns_index = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(data= np.random.rand(2,9), columns = columns_index, index=['feature1', 'feature2'])

which should yield you the following :

first            a                             b                      \
second           1         2         3         1         2         3   
feature1  0.696469  0.286139  0.226851  0.551315  0.719469  0.423106   
feature2  0.392118  0.343178  0.729050  0.438572  0.059678  0.398044   
first            c                      
second           1         2         3  
feature1  0.980764  0.684830  0.480932  
feature2  0.737995  0.182492  0.175452  

Now, if i am interested at 'feature2', and wants to check the influence of 'first', I can do this :

df.loc['feature2'].groupby('second').max()
Out[272]: 
second
1    0.737995
2    0.343178
3    0.729050

Now, the question is, how can I get the following :

The max is obtain with :

so I would like to compute : a : 66.66% b : 0% c : 33.33%

Hope this is clear enough. I am also very interested to hear of any better idea to check the influence of the different parameters if you have an idea.

Thanks !

Upvotes: 1

Views: 241

Answers (2)

BENY
BENY

Reputation: 323306

Or you can try this ..

df.stack().loc['feature2'].stack().groupby(level='second').apply(lambda x : x[x==x.max()])
Out[805]: 
second  second  first
1       1       c        0.737995
2       2       a        0.343178
3       3       a        0.729050

Upvotes: 0

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

Use .idxmax to get the index i.e

df.loc['feature2'].groupby(level=1).idxmax()
second
1    (c, 1)
2    (a, 2)
3    (a, 3)

Upvotes: 2

Related Questions