Leonel
Leonel

Reputation: 29189

BigQuery execution time

After I send a query to be executed in BigQuery, how can I find the execution time and other statistics about this job?

The Python API for BigQuery has a suggestive field timeline, but barely mentioned in the documentation. However, this method returns an empty iterable.

So far, I have been running this Python code, with the BigQuery Python API.

from google.cloud import bigquery

bq_client = bigquery.Client()
job = bq_client.query(some_sql_query, location="US")
ts = list(job.timeline)
if len(ts) > 0:
    for t in ts:
        print(t)
else:
    print('No timeline results ???')    # <-- no timeline results.

Upvotes: 4

Views: 1969

Answers (2)

Grzegorz Gwoźdź
Grzegorz Gwoźdź

Reputation: 9

In case you need to have job logs available after some time and ready for analysis I'd suggest creating a log sink for your big query jobs. It would create table in big query, where job logs are dumped. After that you can easily run analysis to determine how long they took and how much they cost.

First create a sink in Operations/Logging: BigQuery job logs Make sure to set sink as BigQuery Dataset (pick one) and add query_job_completed as logs to include in the sink. After a while you will see new table in that data set similar to: {project}.{dataset}.cloudaudit_googleapis_com_data_access_{date}

Go you BigQuery and create a view. This view for example will show how much data each job consumed (and how much it cost):

SELECT 
    protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.totalBilledBytes as totalBilledBytes
    ,protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.endTime as endTime
    ,protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobConfiguration.query.query
    ,protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobConfiguration.query.destinationTable.datasetId targetDataSet
    ,protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobConfiguration.query.destinationTable.tableId targetTable
    ,protopayload_auditlog.authenticationInfo.principalEmail
    ,(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.totalBilledBytes / 1000000000000.0) * 5.0 as billed

FROM `{project}.{dataset}.cloudaudit_googleapis_com_data_access_*`
where protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.totalBilledBytes > 0

Upvotes: -1

Cylldby
Cylldby

Reputation: 1978

When your job is "DONE", the BigQuery API returns a JobStatistics object (described here) that is accessible in Python through the attributes of your job object.

The available attrs are accessible here.

For accessing the time taken by your job you mainly have the job.created, job.started and job.ended attributes.

To come back to your code snippet, you can try something like this:

from google.cloud import bigquery

bq_client = bigquery.client()
job = bq_client.query(some_sql_query, location="US")

while job.running():
    print("Running..")
    sleep(0.1)

print("The job duration was {}".format(job.ended - job.started))

Upvotes: 5

Related Questions