Juan Carlos
Juan Carlos

Reputation: 367

Pandas MultiIndex: Selecting a column knowing only the second index?

I'm working with the following DataFrame:

   age  height  weight  shoe_size
0  8.0     6.0     2.0        1.0
1  8.0     NaN     2.0        1.0
2  6.0     1.0     4.0        NaN
3  5.0     1.0     NaN        0.0
4  5.0     NaN     1.0        NaN
5  3.0     0.0     1.0        0.0

I added another header to the df in this way:

zipped = list(zip(df.columns, ["RHS", "height", "weight", "shoe_size"]))

df.columns = pd.MultiIndex.from_tuples(zipped)

So this is the new DataFrame:

   age height weight shoe_size
   RHS height weight shoe_size
0  8.0    6.0    2.0       1.0
1  8.0    NaN    2.0       1.0
2  6.0    1.0    4.0       NaN
3  5.0    1.0    NaN       0.0
4  5.0    NaN    1.0       NaN
5  3.0    0.0    1.0       0.0

Now I know how to select the first column, by using the corresponding tuple ("age", "RHS"):

df[("age", "RHS")]

but I was wondering about how to do this by using only the second index "RHS". Ideally something like:

df[(any, "RHS")]

Upvotes: 0

Views: 3041

Answers (2)

Zero
Zero

Reputation: 76917

You could use get_level_values

In [700]: df.loc[:, df.columns.get_level_values(1) == 'RHS']
Out[700]:
   age
   RHS
0  8.0
1  8.0
2  6.0
3  5.0
4  5.0
5  3.0

Upvotes: 3

cs95
cs95

Reputation: 402263

You pass slice(None) as the first argument to .loc, provided you sort your columns first using df.sort_index:

In [325]: df.sort_index(1).loc[:, (slice(None), 'RHS')]
Out[325]: 
   age
   RHS
0  8.0
1  8.0
2  6.0
3  5.0
4  5.0
5  3.0

You can also use pd.IndexSlice with df.loc:

In [332]: idx = pd.IndexSlice

In [333]: df.sort_index(1).loc[:, idx[:, 'RHS']]
Out[333]: 
   age
   RHS
0  8.0
1  8.0
2  6.0
3  5.0
4  5.0
5  3.0

With the slicer, you don't need to explicitly pass slice(None) because IndexSlice does that for you.


If you don't sort your columns, you get:

UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

If you have multiple RHS columns in the second level, all those columns are returned.

Upvotes: 1

Related Questions