Reputation: 437
I have a dataframe that needs to be joined with the result set from a query. The query uses a column from a dataframe to filter the data in the database.
data_list = list(df['needed_column'])
I would like to use the variable in an sql query executed in a Jupyter sql cell.
%%sql
SELECT
column_1,
column_2,
column_3
FROM my_database.my_table
WHERE
column_1 IN data_list
Is there anyway that this can be done?
Upvotes: 2
Views: 3523
Reputation: 1
You can use double brackets, just check Parameterizing SQL queries in the jupysql documentation.
It will look something like this:
sex = "MALE"
%%sql
SELECT *
FROM penguins.csv
WHERE sex = '{{sex}}'
Upvotes: 0
Reputation: 480
for me it is working with single curly braces:
%sql SELECT {dynamic_column} FROM penguins.csv LIMIT {dynamic_limit}
The only thing to be aware if you're working with dynamic strings put them in single quotes in query. (WHERE name = '{custom_name}')
Upvotes: 0
Reputation: 437
An workaround would be to execute the query inline as a variable.
data_list = str(list(df['needed_column']).replace('[', '(').replace(']', ')')
query_string = f"""
SELECT
column_1,
column_2,
column_3
FROM my_database.my_table
WHERE
column_1 IN {data_list}
"""
result_set = %sql $query_string
Upvotes: 2