BENY
BENY

Reputation: 323226

Using df.query on MultiIndex gives UndefinedVariableError

I have two dataframes

df
Out[162]: 
          colA  colB
L0 L1 L2            
A1 B1 C1     1     2
      C2     3     4
   B2 C1     5     6
      C2     7     8
A2 B3 C1     9    10
      C2    11    12
   B4 C1    13    14
      C2    15    16

df1
Out[166]: 
               rate
from to            
CHF  CHF   1.000000
     MXN  19.673256
     ZAR   0.000000
     XAU   0.000775
     THB  32.961405

When I did

df.query('L0=="A1" & L2=="C1"')
Out[167]: 
          colA  colB
L0 L1 L2            
A1 B1 C1     1     2
   B2 C1     5     6

Which give me back the expected out put .

Then I want to apply the same function in df1

df1.query('ilevel_0=="CHF" & ilevel_1=="MXN"') 

and

df1.query('from=="CHF" & to=="MXN"') 

Both failed

What happened here ?


Data Input :

#df
{'colA': {('A1', 'B1', 'C1'): 1,
  ('A1', 'B1', 'C2'): 3,
  ('A1', 'B2', 'C1'): 5,
  ('A1', 'B2', 'C2'): 7,
  ('A2', 'B3', 'C1'): 9,
  ('A2', 'B3', 'C2'): 11,
  ('A2', 'B4', 'C1'): 13,
  ('A2', 'B4', 'C2'): 15},
 'colB': {('A1', 'B1', 'C1'): 2,
  ('A1', 'B1', 'C2'): 4,
  ('A1', 'B2', 'C1'): 6,
  ('A1', 'B2', 'C2'): 8,
  ('A2', 'B3', 'C1'): 10,
  ('A2', 'B3', 'C2'): 12,
  ('A2', 'B4', 'C1'): 14,
  ('A2', 'B4', 'C2'): 16}}


#df1
{'rate': {('CHF', 'CHF'): 1.0,
('CHF', 'MXN'): 19.673256,
  ('CHF', 'THB'): 32.961405,
  ('CHF', 'XAU'): 0.000775,
  ('CHF', 'ZAR'): 0.0}}

Upvotes: 3

Views: 1492

Answers (1)

cs95
cs95

Reputation: 402263

Consider -

df1

               rate
from to            
CHF  CHF   1.000000
     MXN  19.673256
     THB  32.961405
     XAU   0.000775
     ZAR   0.000000

First, the reason for df1.query('ilevel_0=="CHF" & ilevel_1=="MXN"') not working, is because your index already has a name. ilevel_* is the name assigned, when the index does not yet have a name. So, this command gives you an UndefinedVariableError.

Next, the reason for df1.query('from=="CHF" & to=="MXN"') not working, is that from is a keyword in python, and when pandas evals the expression, from == ... is considered invalid syntax. One workaround would be -

df1.rename_axis(['frm', 'to']).query("frm == 'CHF' and to == 'MXN'")


              rate
frm to            
CHF MXN  19.673256

Another would be getting rid of the axis names -

df1.rename_axis([None, None]).query("ilevel_0 == 'CHF' and ilevel_1 == 'MXN'") 

              rate
CHF MXN  19.673256

Keep in mind that query suffers from a host of limitations, mostly revolving around restrictions with variable names.

Upvotes: 4

Related Questions