Reputation: 367
I'm working with the following DataFrame:
age height weight shoe_size
0 8.0 6.0 2.0 1.0
1 8.0 NaN 2.0 1.0
2 6.0 1.0 4.0 NaN
3 5.0 1.0 NaN 0.0
4 5.0 NaN 1.0 NaN
5 3.0 0.0 1.0 0.0
I added another header to the df in this way:
zipped = list(zip(df.columns, ["RHS", "height", "weight", "shoe_size"]))
df.columns = pd.MultiIndex.from_tuples(zipped)
So this is the new DataFrame:
age height weight shoe_size
RHS height weight shoe_size
0 8.0 6.0 2.0 1.0
1 8.0 NaN 2.0 1.0
2 6.0 1.0 4.0 NaN
3 5.0 1.0 NaN 0.0
4 5.0 NaN 1.0 NaN
5 3.0 0.0 1.0 0.0
Now I know how to select the first column, by using the corresponding tuple ("age", "RHS")
:
df[("age", "RHS")]
but I was wondering about how to do this by using only the second index "RHS". Ideally something like:
df[(any, "RHS")]
Upvotes: 0
Views: 3041
Reputation: 76917
You could use get_level_values
In [700]: df.loc[:, df.columns.get_level_values(1) == 'RHS']
Out[700]:
age
RHS
0 8.0
1 8.0
2 6.0
3 5.0
4 5.0
5 3.0
Upvotes: 3
Reputation: 402263
You pass slice(None)
as the first argument to .loc
, provided you sort your columns first using df.sort_index
:
In [325]: df.sort_index(1).loc[:, (slice(None), 'RHS')]
Out[325]:
age
RHS
0 8.0
1 8.0
2 6.0
3 5.0
4 5.0
5 3.0
You can also use pd.IndexSlice
with df.loc
:
In [332]: idx = pd.IndexSlice
In [333]: df.sort_index(1).loc[:, idx[:, 'RHS']]
Out[333]:
age
RHS
0 8.0
1 8.0
2 6.0
3 5.0
4 5.0
5 3.0
With the slicer, you don't need to explicitly pass slice(None)
because IndexSlice
does that for you.
If you don't sort your columns, you get:
UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'
If you have multiple RHS
columns in the second level, all those columns are returned.
Upvotes: 1