Vero
Vero

Reputation: 459

Read SQL in pandas with parameter list or tuple

I have a problem with running an SQL script in pandas with WHERE in clause which is picking members of a list or tuple

    tuple=(1,2,3,4,5,6, 7) # there are 2228 members
    date=20200101

    sql=pd.read_sql_query("""SELECT *                             
                             FROM [MY_TABLE] with (nolock)
                             WHERE [cod] IN (?)
                             and bi_partition>=?""", conn, params=[tuple, date])

The error returned:

('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The data types varchar and ntext are incompatible in the equal to operator. (402) (SQLExecDirectW); [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)')

When using params =[str(tuple), str(date)] the error is:

 ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The data types varchar and ntext are incompatible in the equal to operator. (402) (SQLExecDirectW); [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)') 

I have tried to fix it with different solutions I found on StackOverflow but it doesn't work.

Anybody can help?

Thank you

Upvotes: 1

Views: 1244

Answers (1)

Parfait
Parfait

Reputation: 107767

Consider avoiding the use of thousands of separate values in IN clause from a Python tuple. Since values derived from a different query, use a subquery from source of values with IN.

date = "2020-01-01"

sql = """SELECT Col1, Col2, Col3, ...
         FROM [MY_TABLE]
         WHERE bi_partition >= ?
           AND [cod] IN 
               (SELECT [cod] FROM myOtherTable)"""

df = pd.read_sql_query(sql, conn, params=[date])

Also, consider explicitly identifying columns in SELECT for code readability and maintainability where you and readers of your code can see what fields are actually being retrieved and avoid pulling in huge columns like BLOBs or meta identifiers like GUIDs that are not needed for Pandas analytics. See Why is SELECT * considered harmful?

Upvotes: 1

Related Questions