augustin-barillec
augustin-barillec

Reputation: 531

Bigquery : job is done but job.query_results().total_bytes_processed returns None

The following code :

import time
from google.cloud import bigquery
client = bigquery.Client()
query = """\
select 3 as x
"""
dataset = client.dataset('dataset_name')
table = dataset.table(name='table_name')
job = client.run_async_query('job_name_76', query)
job.write_disposition = 'WRITE_TRUNCATE'
job.destination = table
job.begin()
retry_count = 100
while retry_count > 0 and job.state != 'DONE':
    retry_count -= 1
    time.sleep(10)
    job.reload()
print job.state
print job.query_results().name
print job.query_results().total_bytes_processed

prints :

DONE
job_name_76
None

I do not understand why total_bytes_processed returns None because the job is done and the documentation says :

total_bytes_processed:

Total number of bytes processed by the query.

See https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#totalBytesProcessed

Return type: int, or NoneType

Returns: Count generated on the server (None until set by the server).

Upvotes: 0

Views: 1137

Answers (1)

Willian Fuks
Willian Fuks

Reputation: 11777

Looks like you are right. As you can see in the code, the current API does not process data regarding bytes processed.

This has been reported in this issue and as you can see in this tseaver's PR this feature has already been implemented and awaits review /merging so probably we'll have this code in production quite soon.

In the mean time you could get the result from the _properties attribute of job, like:

from google.cloud.bigquery import Client
import types
import os

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/key.json'

bc = Client()
query = 'your query'
job = bc.run_async_query('name', query)
job.begin()
wait_job(job)

query_results = job._properties['statistics'].get('query')

query_results should have the totalBytesProcessed you are looking for.

Upvotes: 3

Related Questions