Reputation: 11
Our main table has 92 MBytes with 410k lines. We are joining w/ another table that has 10KBytes and 378 lines. AppEngine Project Number: 897138483898 BigQuery Project Number: 1036887365938
We have a couple of queries. The average response time is 10 sec. w/ an average Max/Min ratio of 6.4. We have regular time-outs. The minimal response times are ok ranging from 1.1 to 5 sec w/ a mean of 3.7 sec.
If we have these variation of response times w/ 92 MB ... what will happen when we use real data?
What can we do?
Klaus
Upvotes: 1
Views: 580
Reputation: 11
thanks so much for your quick answer. As you suggested we SELECTed the old table into a new one, deleted the old and SELECTed again using the old name.
The average response time is now 2.2 sec, which is a speed-up of 4.5. The average Max/Min ratio is now 1.9 and thus the response time variation is reduced by a factor of 3.4. The maximal response time is 5 sec. and thus no more time-outs. The minimal response time ranges from 0.8 to 2.7 sec. w/ a mean of 1.6 sec.
Rebuilding a BigQuery seems to reduce the response time variation by 3.4 and speed-up the average response time by 4.5.
That's great! Thanks for your effective and timely help!
Klaus
Upvotes: 0
Reputation: 26617
First, I would like to say that this is much worse performance than I would expect. I looked up your query jobs in the BigQuery logs, and it looks like the issue you're seeing is with a highly-fragmented table. The queries you are doing (or at least one of the slow ones) reference a table that was created in more than 900 separate pieces over 4 days.
BigQuery has a coalesce process that runs periodically to compact tables like this one into fewer chunks, but it has been stalled for a few days. One option you have in the mean time is to copy the table (you can do this via the BigQuery UI, or via a Table Copy job) and use the copied version. This should cause BigQuery to generate a more compact representation.
Also, AppEngine HTTP requests time out after 10 seconds by default. You can increase this value. BigQuery the jobs.query() method also times out after the same time, which can also be increased. (I can look up the reference for how to do both of these if you think it would be helpful).
Upvotes: 1