Reputation: 83
I am trying to run a query over and over again for all dates in a date range and collect the results into a Pandas DF for each iteration.
I established a connection (PYODBC) and created a list of dates I would like to run through the SQL query to aggregate into a DF. I confirmed that the dates are a list.
link = pyodbc.connect( Connection Details )
date = [d.strftime('%Y-%m-%d') for d in pd.date_range('2020-10-01','2020-10-02')]
type(date)
I created an empty DF to collect the results for each iteration of the SQL query and checked the structure.
empty = pd.DataFrame(columns = ['Date', 'Balance'])
empty
I have the query set up as so:
sql = """
Select dt as "Date", sum(BAL)/1000 as "Balance"
From sales as bal
where bal.item IN (1,2,3,4)
AND bal.dt = '{}'
group by "Date";
""".format(day)
I tried the following for loop in the hopes of aggregating the results of each query execution into the empty df, but I get a blank df.
for day in date:
a = (pd.read_sql_query(sql, link))
empty.append(a)
Any ideas if the issue is related to the SQL setup and/or for loop? A better more efficient way to tackle the issue?
Upvotes: 0
Views: 867
Reputation: 29
Looks like you didn't defined the day variable when you generated sql. That may help:
def sql_gen(day):
sql = """
Select dt as "Date", sum(BAL)/1000 as "Balance"
From sales as bal
where bal.item IN (1,2,3,4)
AND bal.dt = '{}'
group by "Date";
""".format(day)
return sql
for day in date:
a = (pd.read_sql_query(sql_gen(day), link))
empty.append(a)
Upvotes: 0
Reputation: 107587
Avoid the loop and run a single SQL query by adding Date
as a GROUP BY
column and pass start and end dates as parameters for filtering. And use the preferred parameterization method instead of string formatting which pandas.read_sql
does support:
# PREPARED STATEMENT WITH ? PLACEHOLDERS
sql = """SALES dt AS "Date"
, SUM(BAL)/1000 AS "Balance"
FROM sales
WHERE item IN (1,2,3,4)
AND dt BETWEEN ? AND ?
GROUP BY dt;
"""
# BIND PARAMS TO QUERY RETURN IN SINGLE DATA FRAME
df = pd.read_sql(sql, conn, params=['2020-10-01', '2020-10-02'])
Upvotes: 1