Taylor Robbins
Taylor Robbins

Reputation: 35

How do you handle large query results for a simple select in bigquery with the python client library?

I have a table where I wrote 1.6 million records, and each has two columns: an ID, and a JSON string column.

I want to select all of those records and write the json in each row as a file. However, the query result is too large, and I get the 403 associated with that: "403 Response too large to return. Consider specifying a destination table in your job configuration."

I've been looking at the below documentation around this and understand that they recommend specifying a table for the results and viewing them there, BUT all I want to do is select * from the table, so that would effectively just be copying it over, and I feel like I would run into the same issue querying that result table.

https://cloud.google.com/bigquery/docs/reference/standard-sql/introduction https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.allow_large_results

What is the best practice here? Pagination? Table sampling? list_rows?

I'm using the python client library as stated in the question title. My current code is just this:

query = f'SELECT * FROM `{project}.{dataset}.{table}`'
return client.query(query)

I should also mention that the IDs are not sequential, they're just alphanumerics.

Upvotes: 0

Views: 998

Answers (1)

Scott B
Scott B

Reputation: 2954

The best practice and efficient way is to export your data and then download it instead of querying the whole table (SELECT *).

From there, you may extract your needed data from the exported files (eg. CSV, JSON, etc) using python code without having to wait for your code to finish the SELECT * query.

Upvotes: 1

Related Questions