Reputation: 1042
Given that I have the following pandas DataFrame:
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']),
np.array([0.01, 0.2, 0.3, -0.5, 0.6, -0.7, -0.8, 0.9])]
tuples = list(zip(*arrays))
df_index = pd.MultiIndex.from_tuples(tuples, names=['A', 'B', 'measure'])
df = pd.DataFrame(np.random.randn(8, 4), index=df_index)
print(df)
How can I filter all the values where for example the measure column (which is part of the index) is greater than 0.2?
I have tried:
df.loc[:,:,0.1:0.9]
(and other variations of this, but I get the error "IndexingError: Too many indexers"
Thanks, Gerald
Upvotes: 2
Views: 5048
Reputation: 434
In line with your initial approach, you could use IndexSlice
df.sort_index().loc[pd.IndexSlice[:, :, 0.2:], :]
Upvotes: 0
Reputation: 27879
This does the trick:
df.iloc[df.index.get_level_values(2) >= 0.2]
Or if you like:
df.iloc[df.index.get_level_values('measure') >= 0.2]
Upvotes: 2
Reputation: 323316
Something like get_level_values
df[df.index.get_level_values(2)>0.2]
Out[35]:
0 1 2 3
A B measure
baz one 0.3 -0.235196 0.183122 -1.620810 0.912996
foo one 0.6 -1.456278 -1.144081 -0.872170 0.547008
qux two 0.9 0.942656 -0.435219 -0.161408 -0.451456
Upvotes: 4
Reputation: 210882
In [3]: df.query("measure > 0.2")
Out[3]:
0 1 2 3
A B measure
baz one 0.3 0.623507 0.602585 -0.792142 2.066095
foo one 0.6 0.138192 -0.159108 -1.796944 1.668463
qux two 0.9 -0.162210 -2.293951 0.602990 1.622783
or
In [6]: df.loc[pd.IndexSlice[:,:,0.200001:], :]
Out[6]:
0 1 2 3
A B measure
baz one 0.3 0.623507 0.602585 -0.792142 2.066095
foo one 0.6 0.138192 -0.159108 -1.796944 1.668463
qux two 0.9 -0.162210 -2.293951 0.602990 1.622783
Upvotes: 6