Reputation: 1110
I am trying to generate queries in python and querying with them. I am working with pandas_gbq. My code looks like this:
def generate_query(
filter=['CENTRAL BANK','DRAGHI','FRANKFURT'],
date ='20171214',
datetimeformat='%Y%m%d',
weekly_data=True
):
filter = str(filter).replace('[','').replace(']','')
if weekly_data == False:
query = '''SELECT * FROM `gdelt-bq.gdeltv2.events` WHERE SQLDATE = {date}'''.format(date = date)
else:
date = datetime.datetime.strptime(date, datetimeformat)
week = generate_week(date)
query = '''SELECT * FROM `gdelt-bq.gdeltv2.events` WHERE SQLDATE IN ({week})'''.format(
week = week).replace('[','').replace(']','')
return query, date
Did not figure how to indent the code on stackoverflow. The problem occurs when trying to query multiple dates:
'SELECT * FROM `gdelt-bq.gdeltv2.events` WHERE SQLDATE IN (\\'20171211\\', \\'20171212\\', \\'20171213\\', \\'20171214\\', \\'20171215\\')'
I have no idea where the backslashes are coming from. Any idea why this is happening?
The function generate_week:
def generate_week(date):
week = []
referenceday = date - datetime.timedelta(days = date.weekday())
for i in range(0,5):
day = referenceday + datetime.timedelta(days = i)
week.append(day.strftime('%Y%m%d'))
return week
The returned error is the following:
pandas_gbq.gbq.GenericGBQException: Reason: 400 No matching signature for operator IN for argument types INT64 and {STRING} at [1:55]
I would be extremely thankful for some help here! :)
Upvotes: 0
Views: 318
Reputation: 5135
I believe your issue is with the date format. Unless you specify otherwise standard BQ expects yyyy-mm-dd format. Read more here. If you use strftime("%Y-%m-%d")
that should be enough.
Also worth pointing out that:
If you're using Python 3.6+ f-strings are a little nicer than .format in this case. Including the change in bullet 1 (but not bullet 2), you could instead write:
f'''SELECT * FROM `gdelt-bq.gdeltv2.events` WHERE SQLDATE IN {week}'''
Upvotes: 2