drake10k
drake10k

Reputation: 437

Python variables in Jupyter SQL cells

I have a dataframe that needs to be joined with the result set from a query. The query uses a column from a dataframe to filter the data in the database.

data_list = list(df['needed_column'])

I would like to use the variable in an sql query executed in a Jupyter sql cell.

%%sql
SELECT
    column_1,
    column_2,
    column_3
FROM my_database.my_table
WHERE
    column_1 IN data_list

Is there anyway that this can be done?

Upvotes: 2

Views: 3523

Answers (3)

Paulo Albuquerque
Paulo Albuquerque

Reputation: 1

You can use double brackets, just check Parameterizing SQL queries in the jupysql documentation.

It will look something like this:

sex = "MALE"
%%sql
SELECT *
FROM penguins.csv
WHERE sex = '{{sex}}'

Upvotes: 0

Gomiunik
Gomiunik

Reputation: 480

for me it is working with single curly braces:

%sql SELECT {dynamic_column} FROM penguins.csv LIMIT {dynamic_limit}

The only thing to be aware if you're working with dynamic strings put them in single quotes in query. (WHERE name = '{custom_name}')

Upvotes: 0

drake10k
drake10k

Reputation: 437

An workaround would be to execute the query inline as a variable.

data_list = str(list(df['needed_column']).replace('[', '(').replace(']', ')')

query_string = f"""
SELECT
    column_1,
    column_2,
    column_3
FROM my_database.my_table
WHERE
    column_1 IN {data_list}
"""

result_set = %sql $query_string

Upvotes: 2

Related Questions