Slavoj
Slavoj

Reputation: 131

How to use pandas query() to correctly reference multiindex column headers in the query expression?

With a simple (single-level) column index one can access a column in a pandas DataFrame using .query() as follows:

df1 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=['A','B'])
df1.query('A > 0.5')

I am struggling to achieve the analogous in a DataFrame with column multi-index:

df2 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=[['A','B'],['C','D']])
df2.query('(A,C) > 0.5') # fails
df2.query('"(A,C)" > 0.5') # fails
df2.query('("A","C") > 0.5') # fails

Is this doable? Thanks...

(As to the motivation: query() seems to allow for very concise selection on a row mutli-index - column single-index dataframe, for example:

df3 = pd.DataFrame(np.random.rand(6,2),index=[[0]*3+[1]*3,range(2,8)],columns=['A','B'])
df3.index.names=['one','two']
df3.query('one==0 & two<4 & A>0.5')

I would like to do something similar with a DF multi-indexed on both axes...)

Upvotes: 13

Views: 2342

Answers (1)

cs95
cs95

Reputation: 402263

There's an open issue on github for this, but in the meantime, one suggested workaround is to refer to the column via the DataFrame variable through @ notation:

df2.query("@df2.A.C > 0.5")

This is not a perfect workaround. If your header names/levels contain spaces, you will need to remove/rename them first.

Upvotes: 8

Related Questions