selva kumar
selva kumar

Reputation: 1156

Google BigQuery large table (105M records) with 'Order Each by' clause produce "Resources Exceeds Query Execution" error

I am running into Serious issue "Resources Exceeds Query Execution" when Google Big Query large table (105M records) with 'Order Each by' clause.

Here is the sample query (which using public data set: Wikipedia):

SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title Order by Id, Title Desc

How to solve this without adding Limit keyword.

Upvotes: 3

Views: 1057

Answers (2)

Tiago Peres
Tiago Peres

Reputation: 15451

I've gone through the same problem and fixed it following the next steps

  1. Run the query without ORDER BY and save in a dataset table.

  2. Export the content from that table to a bucket in GCS using wildcard (BUCKETNAME/FILENAME*.csv)

  3. Download the files to a folder in your machine.

  4. Install XAMPP (if you get a UAC warning) and change some settings after.

  5. Start Apache and MySQL in your XAMPP control panel.

  6. Install HeidiSQL and stablish the connection with your MySQL server (installed with XAMPP)

  7. Create a database and a table with its fields.

  8. Go to Tools > Import CSV file, configure accordingly and import.

  9. Once all data is imported, do the ORDER BY and export the table.

Upvotes: 1

Pentium10
Pentium10

Reputation: 207942

Using order by on big data databases is not an ordinary operation and at some point it exceeds the attributes of big data resources. You should consider sharding your query or run the order by in your exported data.

As I explained to you today in your other question, adding allowLargeResults will allow you to return large response, but you can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.

One option here that you may try is sharding your query.

where ABS(HASH(Id) % 4) = 0

You can play with the above parameters a lot to achieve smaller resultsets and then combining.

Also read Chapter 9 - Understanding Query Execution it explaines how internally sharding works.

You should also read Launch Checklist for BigQuery

Upvotes: 7

Related Questions