user11941538
user11941538

Reputation: 1

Loop SQL Query Over Dictionary

I have a SQL query that uses the first and the last day of the calendar months to generate a subset of data for a given month. I have been trying to figure out how to loop it for a number of months - i have two lists (one for first and another for last days), two tuples (same), and a dictionary (first and last are keys and values) with all these dates - and store all results in one dataframe and i am struggling very bad.

I can do loop and get all the data if i am only using one list or tuple - then i can loop through it and get all the data. if i try to use two, it simply does not work. Is there a way to do what I am trying to do?

fd=['2018-05-01','2018-06-01','2018-07-01']
ld=['2018-05-31','2018-06-30','2018-07-31']
my_dict=dict(zip(fd, ld))
data_check=pd.DataFrame()

fd_d=','.join(my_dict.keys())
ed_d=','.join(['%%(%s)s' % x for x in my_dict])

query= """

SELECT count(distinct ids),first_date, last_date  from table1
where first_date=%s and last_date =%s
group by 2,3
"""
for x in my_dict:
    df=pd.read_sql(query% (fd_d,ed_d),my_dict)
    data_check=data_check.append(df)

Upvotes: 0

Views: 1051

Answers (1)

Parfait
Parfait

Reputation: 107687

In general, please heed three best practices:

  1. Avoid the quadratic copy of using DataFrame.append in a loop. Instead, build a list of data frames to be concatenated once outside the loop.

  2. Use parameterization and not string concatenation which is supported with pandas read_sql. This avoids the need to string format and punctuate with quotes.

  3. Discontinue using the modulo operator, %, for string concatenation as it is de-emphasised (not officially deprecated). Instead, use the superior str.format.

Specifically, for your needs iterate elementwise between two lists using zip without layering it in a dictionary:

query= """SELECT count(distinct ids), first_date, last_date 
          FROM table1 
          WHERE first_date = %s and last_date = %s 
          GROUP BY 2, 3""" 

df_list = []
for f, l in zip(fd, ld): 
   df = pd.read_sql(query, conn, params=[f, l]) 
   df_list.append(df)

final_df = pd.concat(df_list)

Alternatively, avoid the loop and parameters by aggregating on first and last of days of every month in table:

query= """SELECT count(distinct ids), first_date, last_date 
          FROM table1 
          WHERE DATE_PART(d, first_date) = 1
            AND last_date = LAST_DAY(first_date)
          GROUP BY 2, 3
          ORDER BY 2, 3""" 

final_df = pd.read_sql(query, conn) 

Upvotes: 1

Related Questions