How can I select combinations of levels in a pandas Multindex?

Question

I have the following dataframe:

import numpy as np
import pandas as pd

index = pd.MultiIndex.from_product([[1, 2], ['a', 'b', 'c'], ['a', 'b', 'c']],
                                   names=['one', 'two', 'three'])

df = pd.DataFrame(np.random.rand(18, 3), index=index)

            0           1           2
one two three           
1   a   b   0.002568    0.390393    0.040717
        c   0.943853    0.105594    0.738587
    b   b   0.049197    0.500431    0.001677
        c   0.615704    0.051979    0.191894
2   a   b   0.748473    0.479230    0.042476
        c   0.691627    0.898222    0.252423
    b   b   0.270330    0.909611    0.085801
        c   0.913392    0.519698    0.451158

I want to select rows where combination of index levels two and three are (a, b) or (b, c). How can I do this?

I tried df.loc[(slice(None), ['a', 'b'], ['b', 'c']), :] but that gives me all combinations of [a, b] and [b, c], including (a, c) and (b, b), which aren't needed.

I tried df.loc[pd.MultiIndex.from_tuples([(None, 'a', 'b'), (None, 'b', 'c')])] but that returns NaN in level one of the index.

df.loc[pd.MultiIndex.from_tuples([(None, 'a', 'b'), (None, 'b', 'c')])]

            0   1   2
NaN a   b   NaN NaN NaN
    b   c   NaN NaN NaN

So I thought I needed a slice at level one, but that gives me a TypeError:

pd.MultiIndex.from_tuples([(slice(None), 'a', 'b'), (slice(None), 'b', 'c')])

TypeError: unhashable type: 'slice'

I feel like I'm missing some simple one-liner here :).

MaxU - stand with Ukraine · Accepted Answer

Use df.query():

In [174]: df.query("(two=='a' and three=='b') or (two=='b' and three=='c')")
Out[174]:
                      0         1         2
one two three
1   a   b      0.211555  0.193317  0.623895
    b   c      0.685047  0.369135  0.899151
2   a   b      0.082099  0.555929  0.524365
    b   c      0.901859  0.068025  0.742212

UPDATE: we can also generate such "query" dynamically:

In [185]: l = [('a','b'), ('b','c')]

In [186]: q = ' or '.join(["(two=='{}' and three=='{}')".format(x,y) for x,y in l])

In [187]: q
Out[187]: "(two=='a' and three=='b') or (two=='b' and three=='c')"

In [188]: df.query(q)
Out[188]:
                      0         1         2
one two three
1   a   b      0.211555  0.193317  0.623895
    b   c      0.685047  0.369135  0.899151
2   a   b      0.082099  0.555929  0.524365
    b   c      0.901859  0.068025  0.742212

How can I select combinations of levels in a pandas Multindex?

Answers (2)

Related Questions