Reputation: 1182
I've run into a situation where I can query a pandas dataframe with column names and string literals in the query string
import pandas as pd
data = pd.DataFrame([list('aba'),
list('xyz')],
columns=['A', 'B', 'C'])
data.query("A == 'a'")
# A B C
# 0 a b a
data.query("A.eq('a')")
# A B C
# 0 a b a
but when I use a variable substitution instead of the column name, I get errors.
x = 'A'
data.query("@x == 'a'")
# KeyError: 'False: boolean label can not be used without a boolean index'
data.query("@x.eq('a')")
# AttributeError: 'str' object has no attribute 'eq'
What's going on?
Upvotes: 1
Views: 621
Reputation: 13407
Variables referenced by @
in pandas.eval
are supposed to reference variables in the local namespace- not columns within the DataFrame
. So it interprets variables prefixed by @
as non-literals. (e.g. "@x.eq('a')"
is expanded to "'A'.eq('a')"
, whereas you want "A.eq('a')"
)
https://pandas.pydata.org/docs/user_guide/enhancingperf.html#local-variables
If you want to substitute column names in and have pandas parse them as such, you can use the string formatting method, or f-string
.
import pandas as pd
data = pd.DataFrame([list("aba"), list("xyz")], columns=["A", "B", "C"])
x = "A"
data.query(f'{x} == "a"')
A B C
0 a b a
This way the string is interpolated before pandas
gets a hold of it and it can process those variables referring to a column.
Upvotes: 4