Reputation: 125
Possibly a simple question, but when querying a partitioned, would the query run a bit faster if I were to filter between the dates or just say where date >= a certain date? Or would it not really make much of a difference?
For example,
select *
from `project.dataset.table`
where date >= date_sub('2021-10-01',interval 30 day)
or
select *
from `project.dataset.table`
where date between '2021-09-01' and '2021-10-01'
Upvotes: 0
Views: 610
Reputation: 3034
The between will be more performant, because it will process less data, as it is able to prune more partitions. For example in your >= scenario, if the dates went all the way to 2022 then your would be including more data than in the scenario of an explicit between.
You can see an example using this query below. Paste it into the BQ UI and it should display how much data will be processed. Then comment out the first criteria and uncomment the second and see how much data gets processed then.
SELECT *
FROM `bigquery-public-data.wikipedia.pageviews_2021`
WHERE
--processes 241.9 GB
date(datehour) >= date_sub('2021-10-01',interval 30 day)
-- processes 178.4 GB
-- date(datehour) between '2021-09-01' and '2021-10-01'
;
I will say though that if you do not have much in the way of data volume then it may not make much of a difference.
Upvotes: 1