ruben baetens
ruben baetens

Reputation: 2936

Slice pandas' MultiIndex DataFrame

To keep track of all simulation-results in a parametric run, i create a MultIndex DataFrame named dfParRun in pandas as follows:

import pandas as pd
import numpy as np
import itertools
limOpt = [0.1,1,10]
reimbOpt = ['Cash','Time']
xOpt = [0.1, .02, .03, .04, .05, .06, .07, .08]
zOpt = [1,5n10]
arrays = [limOpt, reimbOpt, xOpt, zOpt]
parameters = list(itertools.product(*arrays))
nPar = len(parameters)

variables = ['X', 'Y', 'Z']
nVar = len(variables)
index = pd.MultiIndex.from_tuples(parameters, names=['lim', 'reimb', 'xMax', 'zMax'])

dfParRun = pd.DataFrame(np.random.rand((nPar, nVar)), index=index, columns=variables)

To analyse my parametric run, i want to slice this dataframe but this seems a burden. For example, i want to have all results for xMax above 0.5 and lim equal to 10. At this moment, the only working method i find is:

df = dfParRun.reset_index()
df.loc[(df.xMax>0.5) & (df.lim==10)]

and i wonder if there is a method without resetting the index of the DataFrame ?

Upvotes: 2

Views: 165

Answers (1)

piRSquared
piRSquared

Reputation: 294506

option 1
use pd.IndexSlice
caveat: requires sort_index

dfParRun.sort_index().loc[pd.IndexSlice[10, :, .0500001:, :]]

enter image description here

option 2
use your df after having reset_index

df.query('xMax > 0.05 & lim == 10')

enter image description here


setup

import pandas as pd
import numpy as np
import itertools
limOpt = [0.1,1,10]
reimbOpt = ['Cash','Time']
xOpt = [0.1, .02, .03, .04, .05, .06, .07, .08]
zOpt = [1, 5, 10]
arrays = [limOpt, reimbOpt, xOpt, zOpt]
parameters = list(itertools.product(*arrays))
nPar = len(parameters)

variables = ['X', 'Y', 'Z']
nVar = len(variables)
index = pd.MultiIndex.from_tuples(parameters, names=['lim', 'reimb', 'xMax', 'zMax'])

dfParRun = pd.DataFrame(np.random.rand(*(nPar, nVar)), index=index, columns=variables)
df = dfParRun.reset_index()

Upvotes: 2

Related Questions