Reputation: 53
I have a table in a database containing millions of rows pertaining to transactions spanning across 10+ years. Since it obviously is a waste to import them all I am trying to import a subset of data limited to particular month ranges. When I try to use the below code as a test to the connection and import top 1000 rows it works fine, but then when I specify a date range in the where clause it returns an empty data frame.
I would really appreciate any help I can get to correct this. Thanks in advance
import pyodbc
import pandas as pd
conn = pyodbc.connect('Driver={SQL Server};'\
'Server=NAME;'\
'Database=DBNAME;'\
'Trusted_Connection=yes;')
tquery = """SELECT TOP (1000) * FROM [SALES Transactions_V];"""
df = pd.read_sql_query(tquery, conn)
df.dtypes
Output:
DW_Id int64
Company object
Campaign Initiative object
Closing Entry bool
Department Code object
Description object
Document No object
Document Type int64
Entry No int64
Expense Type object
GL Account No object
Incremental Field datetime64[ns]
Posting Date datetime64[ns]
Strategic Initiative object
Vendor No object
Vendor Name object
Amount float64
GBP Amount float64
Actual per CWT object
DW_Batch int64
DW_SourceCode object
DW_TimeStamp datetime64[ns]
dtype: object
df.head()
DW_Id Company Campaign Initiative Closing Entry Department Code Description Document No Document Type Entry No Expense Type ... Posting Date Strategic Initiative Vendor No Vendor Name Amount GBP Amount Actual per CWT DW_Batch DW_SourceCode DW_TimeStamp
0 1 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 1 None ... 2007-02-27 None None None -125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
1 2 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 2 None ... 2007-02-27 None AGD01 AGDATA, INC. 125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
2 3 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 3 None ... 2007-02-27 None AGD01 AGDATA, INC. 125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
however when I use the below code to filter for date range between 04-01-2020 AND 04-30-2020 it gives me an empty dataframe
df1 = pd.read_sql_query('SELECT * FROM [SALES Transactions_V] WHERE [Posting Date] BETWEEN ''2020-04-01'' AND ''2020-04-30'';', conn)
df1.dtypes
DW_Id object
Company object
Campaign Initiative object
Closing Entry object
Department Code object
Description object
Document No object
Document Type object
Entry No object
Expense Type object
GL Account No object
Incremental Field object
Posting Date object
Strategic Initiative object
Vendor No object
Vendor Name object
Amount object
GBP Amount object
Actual per CWT object
DW_Batch object
DW_SourceCode object
DW_TimeStamp object
dtype: object
I believe the date range where clause is what is throwing this off but I am unable to find a solution to fix this and would really appreciate any input. Thanks!
Upvotes: 1
Views: 1569
Reputation: 107707
Consider parameterization, the industry best practice, when passing values to an SQL query and is supported with pyobbc
and pandas.read_sql_query
. Doing so you avoid the need to escape quotes and concatenate or interpolate literal values or variables.
sql = '''SELECT * FROM [SALES Transactions_V]
WHERE [Posting Date] BETWEEN ? AND ?;
'''
df1 = pd.read_sql_query(sql, conn, params=['2020-04-01', '2020-04-30'])
Or by date parts:
sql = '''SELECT * FROM [SALES Transactions_V]
WHERE YEAR([Posting Date]) = ?
AND MONTH([Posting Date]) = ?;
'''
df1 = pd.read_sql_query(sql, conn, params=[2020, 4])
Upvotes: 2