denim
denim

Reputation: 463

BigQuery - BI Engine measuring savings

I have deployed BI Engine in one of my Google projects and I am measuring cost savings using the following query

with tbl
as
(
    select creation_time, total_bytes_processed, total_bytes_billed,    
           5 * (total_bytes_processed / 1000000000000) as cost_projected, 
           5 * (total_bytes_billed / 1000000000000) as cost_actual

      from `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT b
     where 1=1
       and job_type = "QUERY"
       and creation_time >= '2022-05-10 11:30:00.000 UTC'
       and creation_time <= '2022-05-10 19:00:00.000 UTC'
)
select sum(cost_projected) - sum(cost_actual) as savings
  from tbl
 where 1=1
;

However, I noticed that very often I have accelerated queries (bi_engine_statistics.bi_engine_mode = 'FULL') for which 'total_bytes_billed = total_bytes_processed'. I was expecting that for accelerated queries total_bytes_billed should be equal to zero which does not seem to be the case.

So the questions are:

  1. Is my query the correct way of measuring savings,
  2. Is it normal to have fully accelerated queries with total_bytes_billed > 0?

Upvotes: 1

Views: 576

Answers (2)

Philipp Heinrich
Philipp Heinrich

Reputation: 11

Regarding your second question: I noticed that too and discussed it with Google a while ago, please find an excerpt of their answer below:

during join, one side of the join was reading from the table and another side from shuffle. As BI Engine does not support reading from shuffle, those stages are excluded from full/partial reporting.

Currently, this is not correctly represented in the information schema, but it is billed correctly. This is the (public fcticket https://issuetracker.google.com/issues/274899698?pli=1

Upvotes: 1

Sander van den Oord
Sander van den Oord

Reputation: 12818

  1. To answer your question on calculation of savings:
    I think it is the correct way of measuring your savings, but some queries of type QUERY cannot be used for BI Engine, so it's somewhat unfair to keep counting them in.
    Which is why I wrote the script in this SO question:
    BigQuery BI Engine: how to choose a good reservation size?
    Also you could improve on converting bytes to TB by calculating as follows:
    sum(total_bytes_processed) / pow(1024, 4) AS TB_processed

  1. As for your question on FULL mode and still having to pay:
    I have turned BI Engine on and if I run this query on my data, I get 0 results, so all my bi_engine_mode='FULL' queries have savings:
SELECT 
    total_bytes_processed, 
    total_bytes_billed, 
    bi_engine_statistics
FROM `my_project_id.region-eu.INFORMATION_SCHEMA.JOBS`
WHERE 1=1
    and bi_engine_statistics.bi_engine_mode = 'FULL'
    and total_bytes_processed = total_bytes_billed
    and total_bytes_processed > 0

Upvotes: 0

Related Questions