Reputation: 235
I have a set of data for which I am trying to assess the influence of each parameter. To do so, my first idea is to try and compute the probability of my parameter value yielding the best outcome when locking all other parameters, or more generally to be in the best x%. Let's look at an example to make it clearer :
My data looks like this (but with more levels):
import pandas as pd
import numpy as np
iterables = [['a','b','c'], [1,2,3]]
np.random.seed(123)
columns_index = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(data= np.random.rand(2,9), columns = columns_index, index=['feature1', 'feature2'])
which should yield you the following :
first a b \
second 1 2 3 1 2 3
feature1 0.696469 0.286139 0.226851 0.551315 0.719469 0.423106
feature2 0.392118 0.343178 0.729050 0.438572 0.059678 0.398044
first c
second 1 2 3
feature1 0.980764 0.684830 0.480932
feature2 0.737995 0.182492 0.175452
Now, if i am interested at 'feature2', and wants to check the influence of 'first', I can do this :
df.loc['feature2'].groupby('second').max()
Out[272]:
second
1 0.737995
2 0.343178
3 0.729050
Now, the question is, how can I get the following :
The max is obtain with :
so I would like to compute : a : 66.66% b : 0% c : 33.33%
Hope this is clear enough. I am also very interested to hear of any better idea to check the influence of the different parameters if you have an idea.
Thanks !
Upvotes: 1
Views: 241
Reputation: 323306
Or you can try this ..
df.stack().loc['feature2'].stack().groupby(level='second').apply(lambda x : x[x==x.max()])
Out[805]:
second second first
1 1 c 0.737995
2 2 a 0.343178
3 3 a 0.729050
Upvotes: 0
Reputation: 30605
Use .idxmax
to get the index i.e
df.loc['feature2'].groupby(level=1).idxmax()
second 1 (c, 1) 2 (a, 2) 3 (a, 3)
Upvotes: 2