Reputation: 459
I have a problem with running an SQL script in pandas with WHERE
in clause which is picking members of a list or tuple
tuple=(1,2,3,4,5,6, 7) # there are 2228 members
date=20200101
sql=pd.read_sql_query("""SELECT *
FROM [MY_TABLE] with (nolock)
WHERE [cod] IN (?)
and bi_partition>=?""", conn, params=[tuple, date])
The error returned:
('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The data types varchar and ntext are incompatible in the equal to operator. (402) (SQLExecDirectW); [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)')
When using params =[str(tuple), str(date)] the error is:
('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The data types varchar and ntext are incompatible in the equal to operator. (402) (SQLExecDirectW); [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Statement(s) could not be prepared. (8180)')
I have tried to fix it with different solutions I found on StackOverflow but it doesn't work.
Anybody can help?
Thank you
Upvotes: 1
Views: 1244
Reputation: 107767
Consider avoiding the use of thousands of separate values in IN
clause from a Python tuple. Since values derived from a different query, use a subquery from source of values with IN
.
date = "2020-01-01"
sql = """SELECT Col1, Col2, Col3, ...
FROM [MY_TABLE]
WHERE bi_partition >= ?
AND [cod] IN
(SELECT [cod] FROM myOtherTable)"""
df = pd.read_sql_query(sql, conn, params=[date])
Also, consider explicitly identifying columns in SELECT
for code readability and maintainability where you and readers of your code can see what fields are actually being retrieved and avoid pulling in huge columns like BLOBs or meta identifiers like GUIDs that are not needed for Pandas analytics. See Why is SELECT * considered harmful?
Upvotes: 1