HybridFenix
HybridFenix

Reputation: 125

Date filtering optimization

Possibly a simple question, but when querying a partitioned, would the query run a bit faster if I were to filter between the dates or just say where date >= a certain date? Or would it not really make much of a difference?

For example,

select *

from `project.dataset.table`

where date >= date_sub('2021-10-01',interval 30 day)

or

select *

from `project.dataset.table`

where date between '2021-09-01' and '2021-10-01'

Upvotes: 0

Views: 610

Answers (1)

Daniel Zagales
Daniel Zagales

Reputation: 3034

The between will be more performant, because it will process less data, as it is able to prune more partitions. For example in your >= scenario, if the dates went all the way to 2022 then your would be including more data than in the scenario of an explicit between.

You can see an example using this query below. Paste it into the BQ UI and it should display how much data will be processed. Then comment out the first criteria and uncomment the second and see how much data gets processed then.

SELECT * 
FROM `bigquery-public-data.wikipedia.pageviews_2021` 
WHERE 
--processes 241.9 GB
date(datehour) >= date_sub('2021-10-01',interval 30 day)
-- processes 178.4 GB
-- date(datehour) between '2021-09-01' and '2021-10-01'
;

I will say though that if you do not have much in the way of data volume then it may not make much of a difference.

Upvotes: 1

Related Questions