dan_g
dan_g

Reputation: 2795

Boolean indexing with MultiIndex df (pandas)

I have a MultiIndex dataframe that I'm trying to index in to based on value ranges in my columns and the outermost index levels. So, using the example below, e.g. I'm trying to select the values from v2 that are index l2 where v1 > 12

I can achieve this using multiple index statements, such as: df[df.v1>12].loc['l2', 'v2'], but this seems less than ideal. Is there a way to condense this in to a single statement?

I've been trying to figure out how to use pd.IndexSlice, but can't seem to wrap my head around what the examples in the MultiIndex section of the docs are doing.

df = pd.concat([pd.DataFrame({'v1': range(10, 15), 'v2':range(5, 0, -1)}) 
                for i in range(2)], keys=['l1', 'l2'])

      v1  v2
l1 0  10   5
   1  11   4
   2  12   3
   3  13   2
   4  14   1
l2 0  10   5
   1  11   4
   2  12   3
   3  13   2
   4  14   1

Upvotes: 3

Views: 1404

Answers (1)

jezrael
jezrael

Reputation: 862641

You can use slicers for selecting and then modified boolean indexing with loc for select column v2:

idx = pd.IndexSlice
df1 = df.loc[idx['l2', :], :]
print (df1)
      v1  v2
l2 0  10   5
   1  11   4
   2  12   3
   3  13   2
   4  14   1

print (df1.loc[df1.v1 > 12, 'v2'])
l2  3    2
    4    1
Name: v2, dtype: int32

Another solution with xs:

df1 = df.xs('l2')
print (df1)
   v1  v2
0  10   5
1  11   4
2  12   3
3  13   2
4  14   1

print (df1.loc[df1.v1 > 12, 'v2'])
3    2
4    1
Name: v2, dtype: int32

df1 = df.xs('l2', drop_level=False)
print (df1)
      v1  v2
l2 0  10   5
   1  11   4
   2  12   3
   3  13   2
   4  14   1

print (df1.loc[df1.v1 > 12, 'v2'])
l2  3    2
    4    1
Name: v2, dtype: int32

Solution with selecting first level of index by get_level_values, last if need remove first level use droplevel or reset_index:

df1 = df.loc[(df.v1 > 12) & (df.index.get_level_values(0) == 'l2'), 'v2']
df1.index = df1.index.droplevel(0)
#df1 = df1.reset_index(level=0, drop=True)
print (df1)
3    2
4    1
Name: v2, dtype: int32

Example with IndexSlice:

Select all values in first level and in second from 1 to 3 (thanks piRSquared):

idx = pd.IndexSlice
print (df.loc[idx[:, 1:3], :])
      v1  v2
l1 1  11   4
   2  12   3
   3  13   2
l2 1  11   4
   2  12   3
   3  13   2

Upvotes: 2

Related Questions