Reputation: 363
I'm trying to execute SQL query on Polars dataframes using SQLContext and below is my code:
ctx = pl.SQLContext().register_many(
{"tbl1": df_source, "tbl2": df_reference})
src_df = ctx.execute(pl_sql_query, eager=True)
Here the schema of df_source contains a column named json_message of type Struct with Key Value pairs i.e.,
('json_message', Struct({'id': Int64, 'name': String, 'age': Int64, 'dob': String}))
My sql query to access the struct field is:
pl_sql_query =
"select json_message.id as id, json_message.name as name
from tbl1
where json_message.id in (select id from tbl2)"
When i execute this query, i'm getting an exception no table or alias named 'json_message' found
Not sure how exactly we need to access the struct field value. Tried struct.with_fields but not able to access the value.
Can someone please help me on this?
Upvotes: 4
Views: 541
Reputation: 881
Thanks for making the issue; I have just landed a PR that addresses this, enabling Polars struct-field access via the SQL interface: https://github.com/pola-rs/polars/pull/17109 👍
Upvotes: 6
Reputation: 1273
Unfortunately this doesn't look to be possible in Polars at the moment. However, the SQL functionality is being quite actively developed, so I would think this would be an accepted feature request. I will make it over the next few days if you don't get there first.
EDIT: I see that you've created a feature request and already mentioned there that you are using DuckDB as a workaround.
As for a workaround until (hopefully) this is implemented natively in Polars, DuckDB is probably the best option.
import duckdb
import polars as pl
df_source = pl.DataFrame({
"json_message": {
"id": 1,
"name": "Manuel Neuer",
"age": 38,
"dob": "1986-03-27",
}
})
df_reference = pl.DataFrame({"id": 1})
# I updated the SQL to refer to `df_source` and `df_reference`
# as DuckDB checks variables in scope. `.pl()` converts back to Polars.
duckdb.sql(
"""
select json_message.id as id, json_message.name as name
from df_source
where json_message.id in (select id from df_reference)
"""
).pl()
# ┌─────┬──────────────┐
# │ id ┆ name │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═════╪══════════════╡
# │ 1 ┆ Manuel Neuer │
# └─────┴──────────────┘
P.S. Including a fully reproducible example will save people having to come up with dummy data and will make it even easier to help you!
Upvotes: 3