Adam Hey
Adam Hey

Reputation: 1691

BigQuery query execution costs

I know there is documentation on BigQuery pricing, but I am confused by which value they charge you on. When you compose a query, the editor with show This query will process 69.3 GB when run. but when you've executed the query, there is a Job Information tab next to the Results tab. In that Job Information, there are two values: "Bytes Process" and "Bytes Billed"

I was informed that you are charged on the "Bytes Billed" value (seems logical based on the name!).

What's causing my confusion is that the Bytes Billed for the above 69.3GB query is 472MB. I'm given to believe that the WHERE clause does not impact pricing

Why is it so much less? How can I accurately estimate query costs if I can't see the Bytes Billed beforehand?

Thanks in advance

Edit 1 Here is my query:

SELECT 
  timestamp_trunc(DateTimeUTC, SECOND) as DateTimeUTC,
  ANY_VALUE(if(Code = 'Aftrtmnt_1_Scr_Cat_Tank_Level', value, null)) as Aftrtmnt_1_Scr_Cat_Tank_Level,
  ANY_VALUE(if(Code = 'ctv_ds_ect', value, null)) as ctv_ds_ect,
  ANY_VALUE(if(Code = 'Engine_Coolant_Level', value, null)) as Engine_Coolant_Level,
  ANY_VALUE(if(Code = 'ctv_batt_volt_min', value, null)) as ctv_batt_volt_min,
  ANY_VALUE(if(Code = 'ctv_moderate_brake_count', value, null)) as ctv_moderate_brake_count,
  ANY_VALUE(if(Code = 'ctv_amber_lamp_count', value, null)) as ctv_amber_lamp_count,
  VIN,
  ANY_VALUE(if(Code = 'ctv_trip_distance_miles', value, null)) as ctv_trip_distance_miles,

 
FROM `xxxx.yyyy.zzzz` 
WHERE
  DATE(DateTimeUTC) > '2021-03-01') and  DATE(DateTimeUTC) < '2021-06-01' and 
  Code in ('Aftrtmnt_1_Scr_Cat_Tank_Level', 'ctv_ds_ect', 'Engine_Coolant_Level', 'ctv_trip_distance_miles',  'ctv_batt_volt_min',  'ctv_moderate_brake_count', 'ctv_amber_lamp_count')
  and event_name = 'Trip Detail'
  group by timestamp_trunc(DateTimeUTC, SECOND), VIN

Essentially it just pivots the main table and the intention is to insert the result into another table

THis article states that the WHERE clause does not impact cost, which is different to what I previously thought

Upvotes: 1

Views: 875

Answers (1)

rtenha
rtenha

Reputation: 3638

I believe that your actual cost should never be more than estimated, but could be less.

Consider a table that is both partitioned and clustered. Let's assume the partition is on a date field my_date and clustered on a string field my_type.

Then, consider the following query...

select my_date, my_type from <table>

The estimate thinks you are scanning both columns in their entirety, and so your billing should match the estimate

However, if you filter against the partition, you should see a reduction in both the estimation and the billed amount.

select my_date, my_type from <table> where my_date = '2021-06-17'

But, if you filter against the clustered column, I don't believe the estimate evaluates that filter, because it doesn't know what you are filtering, just which columns. However, when you execute the query, you do get the benefit of the clustering, because it won't actually scan the entire column, just the relevant clusters.

select my_date, my_type from <table> where my_type = 'A'

It is not checking 'A' against the clustering in the estimation. Consider a case where 'A' doesn't exist in that clustered column, the estimator would show an estimate, but you would actually scan 0 Bytes when you execute.

Upvotes: 1

Related Questions