Steven
Steven

Reputation: 928

How can I produce a sorted export of a large BigQuery table?

I would like to produce a sorted CSV export of a large BigQuery table in Google Cloud Storage. Currently to do this, we start with an unsorted table, then do a SELECT * FROM table ORDER BY col1, col2 into another table, and then export that table to GCS. This works well, since the export seems to use the insertion order of the second table.

However, ORDER BY has limitations, as discussed here and here. We get the dreaded "Resources exceeded during query execution." error when we try this on a sufficiently large table.

Is there another way to perform this?

Upvotes: 3

Views: 2280

Answers (1)

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

There are no guarantees that exported data will follow any particular order. I'd suggest to export the table and then sort it with other methods like those detailed in https://unix.stackexchange.com/questions/120096/how-to-sort-big-files.

Since we are talking about a large export, you could pre-partition data before exporting (for example, by year) to help the external sorting algorithm.

Upvotes: 4

Related Questions