user1302884
user1302884

Reputation: 843

BigQuery max query length characters work around

First let me explain the problem. I have 500 unique users. The data from each of these users is split into smaller gzip files(lets say on an average 25 files per user). We have loaded each split gzip file as a separate table in BiqQuery. Therefore, our dataset has 13000 something tables in it.

Now, We have to run time range queries to retrieve some data from each user. We have around 500-1000 different time ranges from a single user. We would like to combine all these time ranges into a single query with logical OR and AND

  WHERE (timestamp >2 and timestamp <3) OR (timestamp >4 and timestamp <5) OR .............. and so on 1000 times

and run them on 13000 tables

Our own tests show that Bigquery has query length limit of 10000 characters?

If we split the conditions into multiple queries we exceed 20,000 daily quota limit.

IS there any work around this, so that we could run these queries without hitting the daily quota limit?

Thanks

JR

Upvotes: 6

Views: 22068

Answers (3)

user3476374
user3476374

Reputation: 61

The big query payload limit, when you use parameterized queries is increased to 10MB instead of 1MB. That helped me.

This is the error message that I got when I tried out to find the limit for payload size of parameterized queries:

{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Request payload size exceeds the limit: 10485760 bytes.",
    "reason" : "badRequest"
  } ],
  "message" : "Request payload size exceeds the limit: 10485760 bytes.",
  "status" : "INVALID_ARGUMENT"
}

Upvotes: 3

sid
sid

Reputation: 107

I faced a similar issue of Big Query SQL Query length limit of 1024K characters when I am passing a big list of the array in WHERE condition.

To resolve it I used a parameterized query. https://cloud.google.com/bigquery/docs/parameterized-queries

Upvotes: 5

Wan B.
Wan B.

Reputation: 18845

I can think of two things:

  • Try reducing the number of tables in the dataset. If they share the same schema, would they be able to combined (denormalised) into a table, or at least less number of tables ?

I have loaded 500000+ JSON gzip files into one table, and querying is much easier.

  • With timestamp, you can try to use a shorter common denominator.

for example instead of

WHERE (timestamp > "2014-06-25:00:00:00" AND timestamp < "2014-06-26:00:00:00")

You could express

WHERE LEFT(timestamp,10) = "2014-06-25"

Hopefully this can reduce your character length limit as well.

Upvotes: 3

Related Questions