Reputation: 1823
I have the following dataframe:
import numpy as np
import pandas as pd
index = pd.MultiIndex.from_product([[1, 2], ['a', 'b', 'c'], ['a', 'b', 'c']],
names=['one', 'two', 'three'])
df = pd.DataFrame(np.random.rand(18, 3), index=index)
0 1 2
one two three
1 a b 0.002568 0.390393 0.040717
c 0.943853 0.105594 0.738587
b b 0.049197 0.500431 0.001677
c 0.615704 0.051979 0.191894
2 a b 0.748473 0.479230 0.042476
c 0.691627 0.898222 0.252423
b b 0.270330 0.909611 0.085801
c 0.913392 0.519698 0.451158
I want to select rows where combination of index levels two
and three
are (a, b)
or (b, c)
. How can I do this?
I tried df.loc[(slice(None), ['a', 'b'], ['b', 'c']), :]
but that gives me all combinations of [a, b]
and [b, c]
, including (a, c)
and (b, b)
, which aren't needed.
I tried df.loc[pd.MultiIndex.from_tuples([(None, 'a', 'b'), (None, 'b', 'c')])]
but that returns NaN
in level one
of the index.
df.loc[pd.MultiIndex.from_tuples([(None, 'a', 'b'), (None, 'b', 'c')])]
0 1 2
NaN a b NaN NaN NaN
b c NaN NaN NaN
So I thought I needed a slice at level one
, but that gives me a TypeError
:
pd.MultiIndex.from_tuples([(slice(None), 'a', 'b'), (slice(None), 'b', 'c')])
TypeError: unhashable type: 'slice'
I feel like I'm missing some simple one-liner here :).
Upvotes: 2
Views: 171
Reputation: 76997
Here's one approach with loc
and get_level_values
In [3231]: idx = df.index.get_level_values
In [3232]: df.loc[((idx('two') == 'a') & (idx('three') == 'b')) |
((idx('two') == 'b') & (idx('three') == 'c'))]
Out[3232]:
0 1 2
one two three
1 a b 0.442332 0.380669 0.832598
b c 0.458145 0.017310 0.068655
2 a b 0.933427 0.148962 0.569479
b c 0.727993 0.172090 0.384461
Generic way
In [3262]: conds = [('a', 'b'), ('b', 'c')]
In [3263]: mask = np.column_stack(
[(idx('two') == c[0]) & (idx('three') == c[1]) for c in conds]
).any(1)
In [3264]: df.loc[mask]
Out[3264]:
0 1 2
one two three
1 a b 0.442332 0.380669 0.832598
b c 0.458145 0.017310 0.068655
2 a b 0.933427 0.148962 0.569479
b c 0.727993 0.172090 0.384461
Upvotes: 1
Reputation: 210932
Use df.query():
In [174]: df.query("(two=='a' and three=='b') or (two=='b' and three=='c')")
Out[174]:
0 1 2
one two three
1 a b 0.211555 0.193317 0.623895
b c 0.685047 0.369135 0.899151
2 a b 0.082099 0.555929 0.524365
b c 0.901859 0.068025 0.742212
UPDATE: we can also generate such "query" dynamically:
In [185]: l = [('a','b'), ('b','c')]
In [186]: q = ' or '.join(["(two=='{}' and three=='{}')".format(x,y) for x,y in l])
In [187]: q
Out[187]: "(two=='a' and three=='b') or (two=='b' and three=='c')"
In [188]: df.query(q)
Out[188]:
0 1 2
one two three
1 a b 0.211555 0.193317 0.623895
b c 0.685047 0.369135 0.899151
2 a b 0.082099 0.555929 0.524365
b c 0.901859 0.068025 0.742212
Upvotes: 2