Reputation: 2795
I have a MultiIndex dataframe that I'm trying to index in to based on value ranges in my columns and the outermost index levels. So, using the example below, e.g. I'm trying to select the values from v2
that are index l2
where v1 > 12
I can achieve this using multiple index statements, such as:
df[df.v1>12].loc['l2', 'v2']
, but this seems less than ideal. Is there a way to condense this in to a single statement?
I've been trying to figure out how to use pd.IndexSlice
, but can't seem to wrap my head around what the examples in the MultiIndex section of the docs are doing.
df = pd.concat([pd.DataFrame({'v1': range(10, 15), 'v2':range(5, 0, -1)})
for i in range(2)], keys=['l1', 'l2'])
v1 v2
l1 0 10 5
1 11 4
2 12 3
3 13 2
4 14 1
l2 0 10 5
1 11 4
2 12 3
3 13 2
4 14 1
Upvotes: 3
Views: 1404
Reputation: 862641
You can use slicers for selecting and then modified boolean indexing
with loc
for select column v2
:
idx = pd.IndexSlice
df1 = df.loc[idx['l2', :], :]
print (df1)
v1 v2
l2 0 10 5
1 11 4
2 12 3
3 13 2
4 14 1
print (df1.loc[df1.v1 > 12, 'v2'])
l2 3 2
4 1
Name: v2, dtype: int32
Another solution with xs
:
df1 = df.xs('l2')
print (df1)
v1 v2
0 10 5
1 11 4
2 12 3
3 13 2
4 14 1
print (df1.loc[df1.v1 > 12, 'v2'])
3 2
4 1
Name: v2, dtype: int32
df1 = df.xs('l2', drop_level=False)
print (df1)
v1 v2
l2 0 10 5
1 11 4
2 12 3
3 13 2
4 14 1
print (df1.loc[df1.v1 > 12, 'v2'])
l2 3 2
4 1
Name: v2, dtype: int32
Solution with selecting first level of index by get_level_values
, last if need remove first level use droplevel
or reset_index
:
df1 = df.loc[(df.v1 > 12) & (df.index.get_level_values(0) == 'l2'), 'v2']
df1.index = df1.index.droplevel(0)
#df1 = df1.reset_index(level=0, drop=True)
print (df1)
3 2
4 1
Name: v2, dtype: int32
Example with IndexSlice
:
Select all values in first level and in second from 1
to 3
(thanks piRSquared):
idx = pd.IndexSlice
print (df.loc[idx[:, 1:3], :])
v1 v2
l1 1 11 4
2 12 3
3 13 2
l2 1 11 4
2 12 3
3 13 2
Upvotes: 2