Sander van den Oord
Sander van den Oord

Reputation: 12818

BigQuery BI Engine: how to choose a good reservation size?

I am enabling BI Engine to speed up my queries and to save cost for my project in the EU region.
What would be a good choice for setting the reservation size? 1GB, 2GB, 4GB?
How do I make that decision?

Upvotes: 1

Views: 1036

Answers (1)

Sander van den Oord
Sander van den Oord

Reputation: 12818

Below is a SQL script that groups queries in amount of GB processed, so first row is 0 to 1 GB processed per query, second row is 1 to 2 GB processed, etc.
And then for every row it shows the amount processed, amount billed and the costs associated and the costs saved.

This should help you in seeing where your costs lie, how many queries you have of a certain size and whether you may increase the size of your reservation of lower it.

Please note that BI Engine only can speed up certain SELECT QUERY and not MERGE, INSERT, CREATE etc statements. And there's more exceptions. So for fair comparison, I'm excluding those types of queries to have a better insight on the size of savings. See also: https://cloud.google.com/bigquery/docs/bi-engine-intro#bi-engine-use-cases

DECLARE QUERY_COST_PER_TB NUMERIC DEFAULT 5.00; -- current cost in dollars of processing 1 TB of data in BQ

with possible_bi_engine_jobs_incl_parent_jobs as (
    select 
        creation_time, 
        bi_engine_statistics,
        cache_hit,
        total_bytes_processed / power(1024, 3) GB_processed,
        floor(total_bytes_processed / power(1024, 3)) GB_processed_floor,
        total_bytes_billed / power(1024, 3) GB_billed, 
        total_bytes_processed / power(1024, 4) * QUERY_COST_PER_TB expected_cost_in_euros,
        total_bytes_billed / power(1024, 4) * QUERY_COST_PER_TB actual_cost_in_euros,
        query,
        job_id,
        parent_job_id,
        user_email,
        job_type,
        statement_type,
    from `my_project_id.region-eu.INFORMATION_SCHEMA.JOBS`
    where 1=1
        and creation_time >= '2022-12-08'
        and creation_time <  '2022-12-09'
        and cache_hit = false  -- bi engine will not be improving on queries that are already cache hits
        and total_bytes_processed is not null  -- if there's no bytes processed, then ignore the job
        and statement_type = 'SELECT'  -- statement types such as MERGE, CREATE, UPDATE cannot be run by bi engine, only SELECT statements
        and job_type = 'QUERY'  -- LOAD jobs etc. cannot be run by bi engine, only QUERY jobs 
        and upper(query) like '%FROM%'  -- query should contain FROM, otherwise it will not be run by bi engine
        and upper(query) not like '%INFORMATION_SCHEMA.%'  -- metadata queries can not be run by bi engine
),
-- to prevent double counting of total_bytes_processed and total_bytes_billed
parent_job_ids_to_ignore as (
    select distinct parent_job_id
    from possible_bi_engine_jobs_incl_parent_jobs
    where parent_job_id is not null
),
possible_bi_engine_jobs_excl_parent_jobs as (
    select *
    from possible_bi_engine_jobs_incl_parent_jobs
    where job_id not in (select parent_job_id from parent_job_ids_to_ignore) -- to prevent double counting of total_bytes_processed and total_bytes_billed
)
select 
    GB_processed_floor, -- all queries which processed less GB than the floor value
    count(1) query_count, 
    sum(case when bi_engine_statistics.bi_engine_mode in ('FULL', 'PARTIAL') then 1 else 0 end) bi_engine_enabled,
    sum(case when bi_engine_statistics.bi_engine_mode in ('DISABLED') or bi_engine_statistics.bi_engine_mode IS NULL then 1 else 0 end) bi_engine_disabled,
    round(sum(GB_processed), 1) GB_processed, 
    round(sum(GB_billed), 1) GB_billed,
    round(sum(expected_cost_in_euros), 2) expected_cost_in_euros,
    round(sum(actual_cost_in_euros), 2) actual_cost_in_euros,
    round(sum(expected_cost_in_euros) - sum(actual_cost_in_euros), 2) saved_cost 
from possible_bi_engine_jobs_excl_parent_jobs
group by GB_processed_floor
order by GB_processed_floor
;

This results in the following cost savings table grouped by size of the queries:

table of cost savings per query size in GB

Other useful links on BI Engine savings:

Upvotes: 2

Related Questions