Reputation: 1
I have a SQL query that uses the first and the last day of the calendar months to generate a subset of data for a given month. I have been trying to figure out how to loop it for a number of months - i have two lists (one for first and another for last days), two tuples (same), and a dictionary (first and last are keys and values) with all these dates - and store all results in one dataframe and i am struggling very bad.
I can do loop and get all the data if i am only using one list or tuple - then i can loop through it and get all the data. if i try to use two, it simply does not work. Is there a way to do what I am trying to do?
fd=['2018-05-01','2018-06-01','2018-07-01']
ld=['2018-05-31','2018-06-30','2018-07-31']
my_dict=dict(zip(fd, ld))
data_check=pd.DataFrame()
fd_d=','.join(my_dict.keys())
ed_d=','.join(['%%(%s)s' % x for x in my_dict])
query= """
SELECT count(distinct ids),first_date, last_date from table1
where first_date=%s and last_date =%s
group by 2,3
"""
for x in my_dict:
df=pd.read_sql(query% (fd_d,ed_d),my_dict)
data_check=data_check.append(df)
Upvotes: 0
Views: 1051
Reputation: 107687
In general, please heed three best practices:
Avoid the quadratic copy of using DataFrame.append
in a loop. Instead, build a list of data frames to be concatenated once outside the loop.
Use parameterization and not string concatenation which is supported with pandas read_sql
. This avoids the need to string format and punctuate with quotes.
Discontinue using the modulo operator, %
, for string concatenation as it is de-emphasised (not officially deprecated). Instead, use the superior str.format
.
Specifically, for your needs iterate elementwise between two lists using zip
without layering it in a dictionary:
query= """SELECT count(distinct ids), first_date, last_date
FROM table1
WHERE first_date = %s and last_date = %s
GROUP BY 2, 3"""
df_list = []
for f, l in zip(fd, ld):
df = pd.read_sql(query, conn, params=[f, l])
df_list.append(df)
final_df = pd.concat(df_list)
Alternatively, avoid the loop and parameters by aggregating on first and last of days of every month in table:
query= """SELECT count(distinct ids), first_date, last_date
FROM table1
WHERE DATE_PART(d, first_date) = 1
AND last_date = LAST_DAY(first_date)
GROUP BY 2, 3
ORDER BY 2, 3"""
final_df = pd.read_sql(query, conn)
Upvotes: 1