How to order and read from a very large bigquery table line by line in Python?

Question

Here is a piece of code I use to read from large bigquery tables line by line in python:

client = bigquery.Client('YOUR_CLIENT_NAME'); 
conn = dbapi.connect(client);
cursor = conn.cursor();
cursor.execute('SELECT * FROM MY_LARGE_TABLE ORDER BY COLUMN_A');
line = cursor.fetchone();
while line != None:
      print('Do something with line')
      line = cursor.fetchone();

And this works fine for some tables. However, it is showing the following error for very large tables:

google.cloud.bigquery.dbapi.exceptions.DatabaseError: 403 Response too large to return. Consider specifying a destination table in your job configuration. For more details, see https://cloud.google.com/bigquery/troubleshooting-errors

Basically, I have a very large table, MY_LARGE_TABLE, on CGP. There is a column in that table, COLUMN_A. I need to iterate over the table (in python) and extract all records with the same COLUMN_A and do some analysis on those records, and repeat this for all unique COLUMN_A values. My plan was (see the above python script) to use ORDER BY COLUMN_A in my query so that the results returned by cursor.execute() are ordered and all records with the same COLUMN_A are next to each other, and I can iterate over the table using fetchone() and do the task in one pass.

Ricco D · Accepted Answer

As confirmed by @khemedi, providing a destination table as suggested by the error solves the issue. See code snippet on adding a destination table on execute.

curr.execute(query,job_config=QueryJobConfig(destination="your_project.your_dataset.your_dest_table"))

How to order and read from a very large bigquery table line by line in Python?

Answers (1)

Related Questions