Shahid Thaika
Shahid Thaika

Reputation: 2305

Bigquery estimates (and bills) query cost incorrectly in partitioned table

I have a table similar to the following

CREATE TABLE `main.viewings`
(
  event_time TIMESTAMP,
  other_columns INT64,
    more_columns STRING
)
PARTITION BY DATE(event_time)
OPTIONS(
  partition_expiration_days=365
);

I then aggregate every new day's data and append that into a reporting table similar to below

DECLARE from_event_time TIMESTAMP DEFAULT (SELECT TIMESTAMP(DATE_ADD(IFNULL(MAX(`date`), '2022-10-31'), INTERVAL 1 DAY)) FROM main.`reporting_table`);
DECLARE to_event_time TIMESTAMP DEFAULT TIMESTAMP(CURRENT_DATE());

SELECT DISTINCT DATE(event_time) AS `date`
FROM main.`viewings`
WHERE
    event_time >= from_event_time
    AND event_time < to_event_time;

Due to some reason, bigquery incorrectly estimates and bills me for the entire size of the viewings table. If I hard code the values for from_event_time and to_event_time, then it correctly estimates a much smaller value.

What's more perplexing, if I only have event_time >= from_event_time in the WHERE condition, then too it estimates it correctly. Only when I add event_time < to_event_time, it starts messing up.

Has anyone faced something similar?

Upvotes: 0

Views: 259

Answers (2)

Shahid Thaika
Shahid Thaika

Reputation: 2305

While what David mentioned is true, I did some testing and declaring the variable as below makes it work somehow. It doesn't estimate right though, but on execution, it charges me the actual (smaller) query cost.

I was able to discover this adjustment to the query by re-creating the table, in staging, and using require_partition_filter = TRUE in the partition options. Initially it said there was no event_time filter, but after adjusting the query, to ensure the query can see it, the same query works in production.

DECLARE from_event_time TIMESTAMP DEFAULT TIMESTAMP( (SELECT DATE_ADD(IFNULL(MAX(`date`), '2022-10-31'), INTERVAL 1 DAY) FROM main.`reporting_table`) );

Upvotes: 1

David Morales
David Morales

Reputation: 670

That is the expected behaviour. Bigquery will prune partitions only under certain conditions, otherwise a full scan is required.

For example, using dynamic values for the partition field.

More information: https://cloud.google.com/bigquery/docs/querying-partitioned-tables#use_a_constant_filter_expression

In your case, this query will prune partitions and it is in fact an example in the docs.

DECLARE to_event_time TIMESTAMP DEFAULT TIMESTAMP(CURRENT_DATE());

But the first one is dynamic triggering then a full scan.

Be careful about exceptions while querying partitioned tables.

Upvotes: 1

Related Questions