inneb
inneb

Reputation: 1110

Google BigQuery API Python

I am trying to generate queries in python and querying with them. I am working with pandas_gbq. My code looks like this:

def generate_query(
    filter=['CENTRAL BANK','DRAGHI','FRANKFURT'],
    date ='20171214',
    datetimeformat='%Y%m%d',
    weekly_data=True
):
    filter = str(filter).replace('[','').replace(']','')
    if weekly_data == False:
        query = '''SELECT * FROM `gdelt-bq.gdeltv2.events` WHERE SQLDATE = {date}'''.format(date = date)
    else:
        date = datetime.datetime.strptime(date, datetimeformat)
        week = generate_week(date)
        query = '''SELECT * FROM `gdelt-bq.gdeltv2.events` WHERE SQLDATE IN ({week})'''.format(
            week = week).replace('[','').replace(']','')
    return query, date

Did not figure how to indent the code on stackoverflow. The problem occurs when trying to query multiple dates:

'SELECT * FROM `gdelt-bq.gdeltv2.events` WHERE SQLDATE IN (\\'20171211\\', \\'20171212\\', \\'20171213\\', \\'20171214\\', \\'20171215\\')'

I have no idea where the backslashes are coming from. Any idea why this is happening?

The function generate_week:

def generate_week(date):
week = []
referenceday = date - datetime.timedelta(days = date.weekday())
for i in range(0,5):
    day = referenceday + datetime.timedelta(days = i)
    week.append(day.strftime('%Y%m%d'))
return week

The returned error is the following:

pandas_gbq.gbq.GenericGBQException: Reason: 400 No matching signature for operator IN for argument types INT64 and {STRING} at [1:55]

I would be extremely thankful for some help here! :)

Upvotes: 0

Views: 318

Answers (1)

ZaxR
ZaxR

Reputation: 5135

I believe your issue is with the date format. Unless you specify otherwise standard BQ expects yyyy-mm-dd format. Read more here. If you use strftime("%Y-%m-%d") that should be enough.

Also worth pointing out that:

  1. If you use tuples instead of lists, SQL will share the same syntax as the Python, so you could avoid replacing the square brackets (makes life easier).
  2. You can use SQL's BETWEEN with the beginning and ending dates (inclusive) instead of listing every date if its a continuous, sequential time period.
  3. If you're using Python 3.6+ f-strings are a little nicer than .format in this case. Including the change in bullet 1 (but not bullet 2), you could instead write:

    f'''SELECT * FROM `gdelt-bq.gdeltv2.events` WHERE SQLDATE IN {week}'''

Upvotes: 2

Related Questions