How to use pandas query() to correctly reference multiindex column headers in the query expression?

Question

With a simple (single-level) column index one can access a column in a pandas DataFrame using .query() as follows:

df1 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=['A','B'])
df1.query('A > 0.5')

I am struggling to achieve the analogous in a DataFrame with column multi-index:

df2 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=[['A','B'],['C','D']])
df2.query('(A,C) > 0.5') # fails
df2.query('"(A,C)" > 0.5') # fails
df2.query('("A","C") > 0.5') # fails

Is this doable? Thanks...

(As to the motivation: query() seems to allow for very concise selection on a row mutli-index - column single-index dataframe, for example:

df3 = pd.DataFrame(np.random.rand(6,2),index=[[0]*3+[1]*3,range(2,8)],columns=['A','B'])
df3.index.names=['one','two']
df3.query('one==0 & two<4 & A>0.5')

I would like to do something similar with a DF multi-indexed on both axes...)

How to use pandas query() to correctly reference multiindex column headers in the query expression?

Answers (1)

Related Questions