Reputation: 1691
I know there is documentation on BigQuery pricing, but I am confused by which value they charge you on. When you compose a query, the editor with show This query will process 69.3 GB when run.
but when you've executed the query, there is a Job Information
tab next to the Results
tab. In that Job Information, there are two values: "Bytes Process" and "Bytes Billed"
I was informed that you are charged on the "Bytes Billed" value (seems logical based on the name!).
What's causing my confusion is that the Bytes Billed for the above 69.3GB query is 472MB. I'm given to believe that the WHERE clause does not impact pricing
Why is it so much less? How can I accurately estimate query costs if I can't see the Bytes Billed beforehand?
Thanks in advance
Edit 1 Here is my query:
SELECT
timestamp_trunc(DateTimeUTC, SECOND) as DateTimeUTC,
ANY_VALUE(if(Code = 'Aftrtmnt_1_Scr_Cat_Tank_Level', value, null)) as Aftrtmnt_1_Scr_Cat_Tank_Level,
ANY_VALUE(if(Code = 'ctv_ds_ect', value, null)) as ctv_ds_ect,
ANY_VALUE(if(Code = 'Engine_Coolant_Level', value, null)) as Engine_Coolant_Level,
ANY_VALUE(if(Code = 'ctv_batt_volt_min', value, null)) as ctv_batt_volt_min,
ANY_VALUE(if(Code = 'ctv_moderate_brake_count', value, null)) as ctv_moderate_brake_count,
ANY_VALUE(if(Code = 'ctv_amber_lamp_count', value, null)) as ctv_amber_lamp_count,
VIN,
ANY_VALUE(if(Code = 'ctv_trip_distance_miles', value, null)) as ctv_trip_distance_miles,
FROM `xxxx.yyyy.zzzz`
WHERE
DATE(DateTimeUTC) > '2021-03-01') and DATE(DateTimeUTC) < '2021-06-01' and
Code in ('Aftrtmnt_1_Scr_Cat_Tank_Level', 'ctv_ds_ect', 'Engine_Coolant_Level', 'ctv_trip_distance_miles', 'ctv_batt_volt_min', 'ctv_moderate_brake_count', 'ctv_amber_lamp_count')
and event_name = 'Trip Detail'
group by timestamp_trunc(DateTimeUTC, SECOND), VIN
Essentially it just pivots the main table and the intention is to insert the result into another table
THis article states that the WHERE clause does not impact cost, which is different to what I previously thought
Upvotes: 1
Views: 875
Reputation: 3638
I believe that your actual cost should never be more than estimated, but could be less.
Consider a table that is both partitioned and clustered. Let's assume the partition is on a date field my_date
and clustered on a string field my_type
.
Then, consider the following query...
select my_date, my_type from <table>
The estimate thinks you are scanning both columns in their entirety, and so your billing should match the estimate
However, if you filter against the partition, you should see a reduction in both the estimation and the billed amount.
select my_date, my_type from <table> where my_date = '2021-06-17'
But, if you filter against the clustered column, I don't believe the estimate evaluates that filter, because it doesn't know what you are filtering, just which columns. However, when you execute the query, you do get the benefit of the clustering, because it won't actually scan the entire column, just the relevant clusters.
select my_date, my_type from <table> where my_type = 'A'
It is not checking 'A'
against the clustering in the estimation. Consider a case where 'A'
doesn't exist in that clustered column, the estimator would show an estimate, but you would actually scan 0 Bytes when you execute.
Upvotes: 1