Reputation: 12818
I am enabling BI Engine to speed up my queries and to save cost for my project in the EU region.
What would be a good choice for setting the reservation size? 1GB, 2GB, 4GB?
How do I make that decision?
Upvotes: 1
Views: 1036
Reputation: 12818
Below is a SQL script that groups queries in amount of GB processed, so first row is 0 to 1 GB processed per query, second row is 1 to 2 GB processed, etc.
And then for every row it shows the amount processed, amount billed and the costs associated and the costs saved.
This should help you in seeing where your costs lie, how many queries you have of a certain size and whether you may increase the size of your reservation of lower it.
Please note that BI Engine only can speed up certain SELECT QUERY and not MERGE, INSERT, CREATE etc statements. And there's more exceptions. So for fair comparison, I'm excluding those types of queries to have a better insight on the size of savings. See also: https://cloud.google.com/bigquery/docs/bi-engine-intro#bi-engine-use-cases
DECLARE QUERY_COST_PER_TB NUMERIC DEFAULT 5.00; -- current cost in dollars of processing 1 TB of data in BQ
with possible_bi_engine_jobs_incl_parent_jobs as (
select
creation_time,
bi_engine_statistics,
cache_hit,
total_bytes_processed / power(1024, 3) GB_processed,
floor(total_bytes_processed / power(1024, 3)) GB_processed_floor,
total_bytes_billed / power(1024, 3) GB_billed,
total_bytes_processed / power(1024, 4) * QUERY_COST_PER_TB expected_cost_in_euros,
total_bytes_billed / power(1024, 4) * QUERY_COST_PER_TB actual_cost_in_euros,
query,
job_id,
parent_job_id,
user_email,
job_type,
statement_type,
from `my_project_id.region-eu.INFORMATION_SCHEMA.JOBS`
where 1=1
and creation_time >= '2022-12-08'
and creation_time < '2022-12-09'
and cache_hit = false -- bi engine will not be improving on queries that are already cache hits
and total_bytes_processed is not null -- if there's no bytes processed, then ignore the job
and statement_type = 'SELECT' -- statement types such as MERGE, CREATE, UPDATE cannot be run by bi engine, only SELECT statements
and job_type = 'QUERY' -- LOAD jobs etc. cannot be run by bi engine, only QUERY jobs
and upper(query) like '%FROM%' -- query should contain FROM, otherwise it will not be run by bi engine
and upper(query) not like '%INFORMATION_SCHEMA.%' -- metadata queries can not be run by bi engine
),
-- to prevent double counting of total_bytes_processed and total_bytes_billed
parent_job_ids_to_ignore as (
select distinct parent_job_id
from possible_bi_engine_jobs_incl_parent_jobs
where parent_job_id is not null
),
possible_bi_engine_jobs_excl_parent_jobs as (
select *
from possible_bi_engine_jobs_incl_parent_jobs
where job_id not in (select parent_job_id from parent_job_ids_to_ignore) -- to prevent double counting of total_bytes_processed and total_bytes_billed
)
select
GB_processed_floor, -- all queries which processed less GB than the floor value
count(1) query_count,
sum(case when bi_engine_statistics.bi_engine_mode in ('FULL', 'PARTIAL') then 1 else 0 end) bi_engine_enabled,
sum(case when bi_engine_statistics.bi_engine_mode in ('DISABLED') or bi_engine_statistics.bi_engine_mode IS NULL then 1 else 0 end) bi_engine_disabled,
round(sum(GB_processed), 1) GB_processed,
round(sum(GB_billed), 1) GB_billed,
round(sum(expected_cost_in_euros), 2) expected_cost_in_euros,
round(sum(actual_cost_in_euros), 2) actual_cost_in_euros,
round(sum(expected_cost_in_euros) - sum(actual_cost_in_euros), 2) saved_cost
from possible_bi_engine_jobs_excl_parent_jobs
group by GB_processed_floor
order by GB_processed_floor
;
This results in the following cost savings table grouped by size of the queries:
Other useful links on BI Engine savings:
Upvotes: 2