Reputation: 1476
I'm learning BigQuery API using Python Client Libraries v0.28 https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#run-a-simple-query
Wrote this simple code to fetch data from the table
1) Create client object
client_ = bigquery.Client.from_service_account_json('/Users/xyz/key.json')
2) Begin new Async query job
QUERY = 'SELECT visitid FROM `1234567.ga_sessions_20180101`'
query_job = client_.query(QUERY
, job_id=str(uuid.uuid4()))
3) poll until the query is DONE
while (query_job.state == 'RUNNING'):
time.sleep(5)
query_job.reload()
4) Fetch the results in iteration
query_job.reload()
iter = query_job.result()
At this stage I'd like to fetch how many rows are in the table. As per the doc GitHub code iter
is of type bigquery.table.RowIterator
with a property [tier.total_rows][1]
5) However, at this stage when I print:
print(iter.total_rows)
It keeps returning None
I'm pretty sure this table is NOT empty an dry query is correctly formatted!
Any help to any pointers what am I missing here will be really helpful... Thanks a lot!
Cheers!
Upvotes: 4
Views: 1873
Reputation: 2157
The current behavior of how RowIterator
returns None
is indeed perplexing. Luckily, according to this issue, tswast's comment from 10 days ago indicates that the developers are working on a better solution.
.total_rows
Currently, .total_rows
is initialized only once iteration begins. (In what follows, for clarity I renamed your iter
variable to row_iter
.)
row_iter = query_job.result()
itr = iter(row_iter)
first_row = next(itr)
print(row_iter.total_rows) # Now you get a number instead of None.
This is ugly because to continue the iteration, we must either handle the first row differently or call row_iter = query_job.result()
again.
A currently-working alternative is to use the value of query_job._query_results.total_rows
. Unfortunately this is cheating because _query_results
is private, so there is no reason to expect that this will work in the future.
If tswast's proposal is implemented, then row_iter.total_rows
will be initialized at the beginning, just as you expect.
In my code, I'm going to use something like
try:
num_rows = row_iter.total_rows or query_job._query_results.total_rows
except NameError:
num_rows = None
to be compatible with future behavior while falling-back to the temporary workaround if necessary.
Upvotes: 0
Reputation: 606
You need to also check query_job.error_result to make sure query succeeded.
You can also see your job in the UI, which can be useful for debugging, using project id and job id:
https://bigquery.cloud.google.com/results/projectid:jobid
Also, query_job.result() already waits for the job completion so you don't need to poll.
Upvotes: 2