Subselect by columns in MultiIndex Pandas DataFrame

Question

I have a dataframe that looks like this:

               u1  u2  u3  u4  u5  u6
level0 level1                        
foo1   x1       0   1   0   0   0   0
       x2       0   1   1   0   1   1
foo2   x3       0   1   0   1   0   1
       x4       1   0   0   0   1   1
foo3   x5       1   0   1   0   0   0
       x6       0   1   1   1   0   0
foo4   x7       1   0   0   1   0   1
       x8       0   1   1   1   0   0

I want to subselect only those indices for which u3==1. So, as output, I should get something like:

               u1  u2  u3  u4  u5  u6
level0 level1                        
foo1   
       x2       0   1   1   0   1   1
foo2   

foo3   x5       1   0   1   0   0   0
       x6       0   1   1   1   0   0
foo4   
       x8       0   1   1   1   0   0

I have tried doing:

idx  = pd.IndexSlice
df.loc[idx[:,:],'u2']==1

which gives:

level0  level1
foo1    x1         True
        x2         True
foo2    x3         True
        x4        False
foo3    x5        False
        x6         True
foo4    x7        False
        x8         True

but I don't know how to use this to index the original dataframe.

Any help appreciated.

MaxU - stand with Ukraine · Accepted Answer

you can use query() method or regular boolean indexing:

In [11]: df.query('u2 == 1')
Out[11]:
               u1  u2  u3  u4  u5  u6
level0 level1
foo1   x1       0   1   0   0   0   0
       x2       0   1   1   0   1   1
foo2   x3       0   1   0   1   0   1
foo3   x6       0   1   1   1   0   0
foo4   x8       0   1   1   1   0   0

In [12]: df.loc[df['u2'] == 1]
Out[12]:
               u1  u2  u3  u4  u5  u6
level0 level1
foo1   x1       0   1   0   0   0   0
       x2       0   1   1   0   1   1
foo2   x3       0   1   0   1   0   1
foo3   x6       0   1   1   1   0   0
foo4   x8       0   1   1   1   0   0

using .query() method also allows you to search by index levels:

In [17]: df.query("level0 in ['foo2','foo3'] and u2 == 1")
Out[17]:
               u1  u2  u3  u4  u5  u6
level0 level1
foo2   x3       0   1   0   1   0   1
foo3   x6       0   1   1   1   0   0

UPDATE:

how can I select all the u? for which the x1==1 and x3==1?

if you mean u1 and u3 then there are quite a few ways to achieve that:

In [8]: df.query("u1 == 1 and u3 == 1")
Out[8]:
               u1  u2  u3  u4  u5  u6
level0 level1
foo3   x5       1   0   1   0   0   0

In [9]: df.loc[(df['u1'] == 1) & (df['u3'] == 1)]
Out[9]:
               u1  u2  u3  u4  u5  u6
level0 level1
foo3   x5       1   0   1   0   0   0

In [10]: df.loc[df[['u1','u3']].eq(1).all(1)]
Out[10]:
               u1  u2  u3  u4  u5  u6
level0 level1
foo3   x5       1   0   1   0   0   0

Subselect by columns in MultiIndex Pandas DataFrame

Answers (1)

Related Questions