Reputation: 131
With a simple (single-level) column index one can access a column in a pandas DataFrame using .query() as follows:
df1 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=['A','B'])
df1.query('A > 0.5')
I am struggling to achieve the analogous in a DataFrame with column multi-index:
df2 = pd.DataFrame(np.random.rand(10,2),index=range(10),columns=[['A','B'],['C','D']])
df2.query('(A,C) > 0.5') # fails
df2.query('"(A,C)" > 0.5') # fails
df2.query('("A","C") > 0.5') # fails
Is this doable? Thanks...
(As to the motivation: query() seems to allow for very concise selection on a row mutli-index - column single-index dataframe, for example:
df3 = pd.DataFrame(np.random.rand(6,2),index=[[0]*3+[1]*3,range(2,8)],columns=['A','B'])
df3.index.names=['one','two']
df3.query('one==0 & two<4 & A>0.5')
I would like to do something similar with a DF multi-indexed on both axes...)
Upvotes: 13
Views: 2342
Reputation: 402263
There's an open issue on github for this, but in the meantime, one suggested workaround is to refer to the column via the DataFrame variable through @
notation:
df2.query("@df2.A.C > 0.5")
This is not a perfect workaround. If your header names/levels contain spaces, you will need to remove/rename them first.
Upvotes: 8