Reputation: 14563
I need to download a relatively small table from BigQuery and store it (after some parsing) in a Panda dataframe .
Here is the relevant sample of my code:
from google.cloud import bigquery
client = bigquery.Client(project="project_id")
job_config = bigquery.QueryJobConfig(allow_large_results=True)
query_job = client.query("my sql string", job_config=job_config)
result = query_job.result()
rows = [dict(row) for row in result]
pdf = pd.DataFrame.from_dict(rows)
My problem:
After a few thousands rows parsed, one of them is too big and I get an exception: google.api_core.exceptions.Forbidden
.
So, after a few iterations, I tried to transform my loop to something that looks like:
rows = list()
for _ in range(result.total_rows):
try:
rows.append(dict(next(result)))
except google.api_core.exceptions.Forbidden:
pass
BUT it doesn't work since result
is a bigquery.table.RowIterator
and despite its name, it's not an iterator... it's an iterable
So... what do I do now? Is there a way to either:
Upvotes: 0
Views: 207
Reputation: 10222
Did you try paging through query results?
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
query = """
SELECT name, SUM(number) as total_people
FROM `bigquery-public-data.usa_names.usa_1910_2013`
GROUP BY name
ORDER BY total_people DESC
"""
query_job = client.query(query) # Make an API request.
query_job.result() # Wait for the query to complete.
# Get the destination table for the query results.
#
# All queries write to a destination table. If a destination table is not
# specified, the BigQuery populates it with a reference to a temporary
# anonymous table after the query completes.
destination = query_job.destination
# Get the schema (and other properties) for the destination table.
#
# A schema is useful for converting from BigQuery types to Python types.
destination = client.get_table(destination)
# Download rows.
#
# The client library automatically handles pagination.
print("The query data:")
rows = client.list_rows(destination, max_results=20)
for row in rows:
print("name={}, count={}".format(row["name"], row["total_people"]))
Also you can try to filter out big rows in your query:
WHERE LENGTH(some_field) < 123
or
WHERE LENGTH(CAST(some_field AS BYTES)) < 123
Upvotes: 1