Pandas MultiIndex: Selecting a column knowing only the second index?

Question

I'm working with the following DataFrame:

   age  height  weight  shoe_size
0  8.0     6.0     2.0        1.0
1  8.0     NaN     2.0        1.0
2  6.0     1.0     4.0        NaN
3  5.0     1.0     NaN        0.0
4  5.0     NaN     1.0        NaN
5  3.0     0.0     1.0        0.0

I added another header to the df in this way:

zipped = list(zip(df.columns, ["RHS", "height", "weight", "shoe_size"]))

df.columns = pd.MultiIndex.from_tuples(zipped)

So this is the new DataFrame:

   age height weight shoe_size
   RHS height weight shoe_size
0  8.0    6.0    2.0       1.0
1  8.0    NaN    2.0       1.0
2  6.0    1.0    4.0       NaN
3  5.0    1.0    NaN       0.0
4  5.0    NaN    1.0       NaN
5  3.0    0.0    1.0       0.0

Now I know how to select the first column, by using the corresponding tuple ("age", "RHS"):

df[("age", "RHS")]

but I was wondering about how to do this by using only the second index "RHS". Ideally something like:

df[(any, "RHS")]

cs95 · Accepted Answer

You pass slice(None) as the first argument to .loc, provided you sort your columns first using df.sort_index:

In [325]: df.sort_index(1).loc[:, (slice(None), 'RHS')]
Out[325]: 
   age
   RHS
0  8.0
1  8.0
2  6.0
3  5.0
4  5.0
5  3.0

You can also use pd.IndexSlice with df.loc:

In [332]: idx = pd.IndexSlice

In [333]: df.sort_index(1).loc[:, idx[:, 'RHS']]
Out[333]: 
   age
   RHS
0  8.0
1  8.0
2  6.0
3  5.0
4  5.0
5  3.0

With the slicer, you don't need to explicitly pass slice(None) because IndexSlice does that for you.

If you don't sort your columns, you get:

UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

If you have multiple RHS columns in the second level, all those columns are returned.

Pandas MultiIndex: Selecting a column knowing only the second index?

Answers (2)

Related Questions