Reputation: 928
I would like to produce a sorted CSV export of a large BigQuery table in Google Cloud Storage. Currently to do this, we start with an unsorted table, then do a SELECT * FROM table ORDER BY col1, col2
into another table, and then export that table to GCS. This works well, since the export seems to use the insertion order of the second table.
However, ORDER BY
has limitations, as discussed here and here. We get the dreaded "Resources exceeded during query execution." error when we try this on a sufficiently large table.
Is there another way to perform this?
Upvotes: 3
Views: 2280
Reputation: 59175
There are no guarantees that exported data will follow any particular order. I'd suggest to export the table and then sort it with other methods like those detailed in https://unix.stackexchange.com/questions/120096/how-to-sort-big-files.
Since we are talking about a large export, you could pre-partition data before exporting (for example, by year) to help the external sorting algorithm.
Upvotes: 4