Reputation: 806
Here is a piece of code I use to read from large bigquery tables line by line in python:
client = bigquery.Client('YOUR_CLIENT_NAME');
conn = dbapi.connect(client);
cursor = conn.cursor();
cursor.execute('SELECT * FROM MY_LARGE_TABLE ORDER BY COLUMN_A');
line = cursor.fetchone();
while line != None:
print('Do something with line')
line = cursor.fetchone();
And this works fine for some tables. However, it is showing the following error for very large tables:
google.cloud.bigquery.dbapi.exceptions.DatabaseError: 403 Response too large to return. Consider specifying a destination table in your job configuration. For more details, see https://cloud.google.com/bigquery/troubleshooting-errors
Basically, I have a very large table, MY_LARGE_TABLE
, on CGP. There is a column in that table, COLUMN_A
. I need to iterate over the table (in python) and extract all records with the same COLUMN_A
and do some analysis on those records, and repeat this for all unique COLUMN_A
values. My plan was (see the above python script) to use ORDER BY COLUMN_A
in my query so that the results returned by cursor.execute()
are ordered and all records with the same COLUMN_A
are next to each other, and I can iterate over the table using fetchone()
and do the task in one pass.
Upvotes: 0
Views: 837
Reputation: 7277
As confirmed by @khemedi, providing a destination table as suggested by the error solves the issue. See code snippet on adding a destination table on execute.
curr.execute(query,job_config=QueryJobConfig(destination="your_project.your_dataset.your_dest_table"))
Upvotes: 3