Reputation: 2305
I have a table similar to the following
CREATE TABLE `main.viewings`
(
event_time TIMESTAMP,
other_columns INT64,
more_columns STRING
)
PARTITION BY DATE(event_time)
OPTIONS(
partition_expiration_days=365
);
I then aggregate every new day's data and append that into a reporting table similar to below
DECLARE from_event_time TIMESTAMP DEFAULT (SELECT TIMESTAMP(DATE_ADD(IFNULL(MAX(`date`), '2022-10-31'), INTERVAL 1 DAY)) FROM main.`reporting_table`);
DECLARE to_event_time TIMESTAMP DEFAULT TIMESTAMP(CURRENT_DATE());
SELECT DISTINCT DATE(event_time) AS `date`
FROM main.`viewings`
WHERE
event_time >= from_event_time
AND event_time < to_event_time;
Due to some reason, bigquery incorrectly estimates and bills me for the entire size of the viewings table. If I hard code the values for from_event_time and to_event_time, then it correctly estimates a much smaller value.
What's more perplexing, if I only have event_time >= from_event_time
in the WHERE condition, then too it estimates it correctly. Only when I add event_time < to_event_time
, it starts messing up.
Has anyone faced something similar?
Upvotes: 0
Views: 259
Reputation: 2305
While what David mentioned is true, I did some testing and declaring the variable as below makes it work somehow. It doesn't estimate right though, but on execution, it charges me the actual (smaller) query cost.
I was able to discover this adjustment to the query by re-creating the table, in staging, and using require_partition_filter = TRUE
in the partition options. Initially it said there was no event_time filter, but after adjusting the query, to ensure the query can see it, the same query works in production.
DECLARE from_event_time TIMESTAMP DEFAULT TIMESTAMP( (SELECT DATE_ADD(IFNULL(MAX(`date`), '2022-10-31'), INTERVAL 1 DAY) FROM main.`reporting_table`) );
Upvotes: 1
Reputation: 670
That is the expected behaviour. Bigquery will prune partitions only under certain conditions, otherwise a full scan is required.
For example, using dynamic values for the partition field.
More information: https://cloud.google.com/bigquery/docs/querying-partitioned-tables#use_a_constant_filter_expression
In your case, this query will prune partitions and it is in fact an example in the docs.
DECLARE to_event_time TIMESTAMP DEFAULT TIMESTAMP(CURRENT_DATE());
But the first one is dynamic triggering then a full scan.
Be careful about exceptions while querying partitioned tables.
Upvotes: 1