Reputation: 157
I need to create the following report scalable:
query = """
(SELECT
'02/11/2019' as Week_of,
media_type,
campaign,
count(ad_start_ts) as frequency
FROM usotomayor.digital
WHERE ds between 20190211 and 20190217
GROUP BY 1,2,3)
UNION ALL
(SELECT
'02/18/2019' as Week_of,
media_type,
campaign,
count(ad_start_ts) as frequency
FROM usotomayor.digital
WHERE ds between 20190211 and 20190224
GROUP BY 1,2,3)
"""
#Converting to dataframe
query2 = spark.sql(query).toPandas()
query2
However, as you can see I cannot make this report scalable if I have a long list of dates for each SQL query that I need to union.
My first attempt at looping in a list of date variables into the SQL script is as follows:
dfys = ['20190217','20190224']
df2 = ['02/11/2019','02/18/2019']
for i in df2:
date=i
for j in dfys:
date2=j
query = f"""
SELECT
'{date}' as Week_of,
raw.media_type,
raw.campaign,
count(raw.ad_start_ts) as frequency
FROM usotomayor.digital raw
WHERE raw.ds between 20190211 and {date2}
GROUP BY 1,2,3
"""
#Converting to dataframe
query2 = spark.sql(query).toPandas()
query2
However, this is not working for me. I think I need to loop through the sql query itself, but I don't know how to do this. Can someone help me?
Upvotes: 0
Views: 1786
Reputation: 1083
As a commenter said "this is not working for me" is not very specific so let's start at specifying the problem. You need to execute a query for each pair of dates you need to execute these queries as a loop and save the result (or actually union them, but then you need to change your query logic).
You could do it like this:
dfys = ['20190217', '20190224']
df2 = ['02/11/2019', '02/18/2019']
query_results = list()
for start_date, end_date in zip(dfys, df2):
query = f"""
SELECT
'{start_date}' as Week_of,
raw.media_type,
raw.campaign,
count(raw.ad_start_ts) as frequency
FROM usotomayor.digital raw
WHERE raw.ds between 20190211 and {end_date}
GROUP BY 1,2,3
"""
query_results.append(spark.sql(query).toPandas())
query_results[0]
query_results[1]
Now you get a list of your results (query_results).
Upvotes: 3