favq
favq

Reputation: 789

Python BigQuery client - setting query result timeout

Consider the following script (adapted from the Google Cloud Python documentation: https://google-cloud-python.readthedocs.io/en/0.32.0/bigquery/usage.html#querying-data), which runs a BigQuery query with a timeout of 30 seconds:

import logging

from google.cloud import bigquery

# Set logging level to DEBUG in order to see the HTTP requests
# being made by urllib3
logging.basicConfig(level=logging.DEBUG)

PROJECT_ID = "project_id" # replace by actual project ID

client = bigquery.Client(project=PROJECT_ID)

QUERY = ('SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
        'WHERE state = "TX" '
        'LIMIT 100')
TIMEOUT = 30  # in seconds
query_job = client.query(QUERY)  # API request - starts the query
assert query_job.state == 'RUNNING'

# Waits for the query to finish
iterator = query_job.result(timeout=TIMEOUT)
rows = list(iterator)

assert query_job.state == 'DONE'
assert len(rows) == 100
row = rows[0]
assert row[0] == row.name == row['name']

The linked documentation says:

Use of the timeout parameter is optional. The query will continue to run in the background even if it takes longer the timeout allowed.

When I run it with google-cloud-bigquery version 1.23.1, the logging output seem to indicate that "timeoutMs" is 10 seconds.

DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "GET /bigquery/v2/projects/project_id/queries/5ceceaeb-e17c-4a86-8a27-574ad561b856?maxResults=0&timeoutMs=10000&location=US HTTP/1.1" 200 None

Notice the timeoutMs=10000 in the output above.

This seems to happen whenever I call result with a timeout value that is higher than 10. On the other hand, if I use a value lower than 10 as the timeout value, the timeoutMs value looks correct. For example, if I change TIMEOUT = 30 to TIMEOUT = 5 in the script above, the log shows:

DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "GET /bigquery/v2/projects/project_id/queries/71a28435-cbcb-4d73-b932-22e58e20d994?maxResults=0&timeoutMs=4900&location=US HTTP/1.1" 200 None

Is this behavior expected?

Thank you in advance and best regards.

Upvotes: 2

Views: 6305

Answers (1)

Tlaquetzal
Tlaquetzal

Reputation: 2850

The timeout parameter performs in a best-effort manner to execute all the API calls within the method in the timeframe indicated. Internally, the result() method can perform more than one request, and the getQueryResults request in the log:

DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "GET /bigquery/v2/projects/project_id/queries/5ceceaeb-e17c-4a86-8a27-574ad561b856?maxResults=0&timeoutMs=10000&location=US HTTP/1.1" 200 None

is executed inside the done() method. You can see the source code to understand how the timeout for the request is calculated, but basically, it is the minimum value between 10 seconds and the user timeout. If the operation has not been completed, it will be retried until the timeout has been reached.

Upvotes: 1

Related Questions