Reputation: 324
I am considering replacing my use of pandas
with polars
in a tool that allows users to input predicate expressions for filtering/subsetting data rows. This allows users to use expressions that the pandas.DataFrame.query
method can parse, such as "x > 1"
, as a very simple example.
However, I can't seem to find a way to use the same types of string expressions with polars.DataFrame.filter
so that I can swap out pandas
for polars
without requiring users to change their predicate expressions.
The only thing I've found that's close to my question is this posting: String as a condition in a filter
Unfortunately, that's not quite what I need, as it still requires a string expression like "pl.col('x') > 1"
rather than simply "x > 1"
.
Is there a way to use the simpler ("agnostic") syntax with polars
?
Using the example from the polars.DataFrame.filter
docs:
>>> df = pl.DataFrame(
... {
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ["a", "b", "c"],
... }
... )
When calling df.filter
, I'm forced to use expressions like the following:
pl.col("foo") < 3
(pl.col("foo") < 3) & (pl.col("ham") == "a")
However, I want to be able to use the following string expressions instead, respectively, so that the users of the tool (currently using pandas
) do not have to be aware of the polars
-specific syntax (thus allowing me to swap libraries without impacting users):
"foo < 3"
"foo < 3 & ham == 'a'"
When I attempt to do so, here's what happens, which is puzzling since str
is one of the supported types for the predicate argument, so it is unclear as to the syntax supported for str
predicates since the docs do not show any examples of such:
>>> df.filter("foo < 3")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Caskroom/miniconda/base/envs/gedi_subset/lib/python3.10/site-packages/polars/internals/dataframe/frame.py", line 2565, in filter
self.lazy()
File "/usr/local/Caskroom/miniconda/base/envs/gedi_subset/lib/python3.10/site-packages/polars/utils.py", line 391, in wrapper
return fn(*args, **kwargs)
File "/usr/local/Caskroom/miniconda/base/envs/gedi_subset/lib/python3.10/site-packages/polars/internals/lazyframe/frame.py", line 1165, in collect
return pli.wrap_df(ldf.collect())
exceptions.NotFoundError: foo < 3
What I was expecting was the same return value that df.filter(pl.col("foo") < 3)
would return.
Upvotes: 3
Views: 977
Reputation: 14630
You could try to use the pl.sql
for that.
import polars as pl
df = pl.DataFrame(
{
"foo": [1, 2, 3],
"bar": [6, 7, 8],
"ham": ["a", "b", "c"],
}
)
string_expr = "foo < 3 and ham = 'a'"
pl.sql(f"""
SELECT * FROM df
WHERE {string_expr}
""").collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
└─────┴─────┴─────┘
Note that the SQL
language doesn't use the bitwise &
nor equality ==
the same way as pandas, so you might need to replace &
with and
and ==
with =
.
Upvotes: 4