Borislav Gizdov
Borislav Gizdov

Reputation: 1960

BigQuery with BI Engine is slower than BigQuery with cache

I've read almost all the threads about how to improve BigQuery performance, to retrieve data in milliseconds or at least under a second. I decided to use BI Engine for the purpose because it has seamless integration without code changes, it supports partitioning, smart offloading, real-time data, built-in compression, low latency, etc.

Unfortunately for the same query, I got a slower response time with the BI engine enabled, than just the query with cache enabled.

Average 691ms response time from BigQuery API

https://gist.github.com/bgizdov/b96c6c3d795f5f14e5e9a3e9d7091d85

Average 1605ms response time from BigQuery API.

finalExecutionDurationMs is about 200-300ms, but the total time to retrieve the data (just 8 rows) is 5-6 times more.

BigQuery UI: Elapsed 766ms, the actual time for their call to REST entity service is 1.50s. This explains why I get similar results.

https://gist.github.com/bgizdov/fcabcbce9f96cf7dc618298b2d69575d

I am using Quarkus with BigQuery integration and measuring the time for the query with Stopwatch by Guava.

The table is about 350MB, the BI reservation is 1GB. The returned rows are 8, aggregated from 300 rows. This is a very small data size with a simple query. I know BigQuery does not perform well with small data sizes, or it doesn't matter, but I want to get data for under a second, that's why I tried BI, and it will not improve with big datasets.

Upvotes: 1

Views: 656

Answers (1)

Alexey Klishin
Alexey Klishin

Reputation: 184

Could you please share job id?

BI Engine enables a number of optimizations, and for vast majority of queries they allow significantly faster and efficient processing.

However, there are corner cases when BI Engine optimizations are not as effective. One issue is initial loading of the data - we fetch data into RAM using optimal encoding, whereas BigQuery processes data directly. Subsequent queries should be faster. Another is - some operators are very easy to optimize to maximize CPU utilization (e.g. aggregations/filtering/compute), while others may be more tricky.

Upvotes: 0

Related Questions