Ocheezy
Ocheezy

Reputation: 141

Using a row in a dataframe inside of a SQL IN statement

I have a row of a dataframe that I am trying to output into a list or tuple, that will go into a "IN" statement in an SQL query. This is the code I have so far.

check_query = f"SELECT DISTINCT SSN AS SS, EnteredDate, CONVERT(VARCHAR, D, 23) AS F_D FROM PD WHERE EnteredDate IS NOT NULL AND SSN IN {tuple(list(df_to_add_pc['SS']))}"
print(check_query)

The output I am currently receiving is:

Execution failed on sql 'SELECT DISTINCT SSN AS SS, EnteredDate, CONVERT(VARCHAR, D, 23) AS F_D FROM PD WHERE EnteredDate IS NOT NULL AND SSN IN (1, 12, 123, 1234, 12345)'

Here is the Dataframe column list:

col_list = ["SS", "LN", "FN", "DB", "G", "F_D", "SRC", "Q_FACTOR"]

It is using them as integers and they need to be strings. Is there any way convert these to a string and output them into the query the same way?

Note: The SSN's inside of this question are obviously changed for security reasons. Also the variable type must be string as some of these SSN's have identifiers in front of them.

Thanks in advance!

Upvotes: 1

Views: 185

Answers (1)

Milan Cermak
Milan Cermak

Reputation: 8064

You need to build the whole IN statement, with the SQL strings yourself. Try the following:

in_statement = ', '.join([f"'{i}'" for i in df_to_add_pc['SS']])
check_query = f"SELECT DISTINCT SSN AS SS, EnteredDate, CONVERT(VARCHAR, D, 23) AS F_D FROM PD WHERE EnteredDate IS NOT NULL AND SSN IN ({in_statement})"

Or if you want a messy one-liner:

check_query = f"""SELECT DISTINCT SSN AS SS, EnteredDate, CONVERT(VARCHAR, D, 23) AS F_D FROM PD WHERE EnteredDate IS NOT NULL AND SSN IN ({', '.join(["'"+str(i)+"'" for i in df_to_add_pc['SS']])})"""

Note that this leaves the code open for potential SQL Injection attacks. You should use query placeholders to prevent this. Please consult the documentation of the SQL lib you're using for details.

Upvotes: 1

Related Questions