Mim
Mim

Reputation: 1069

BigQuery filter per the last Date and use Partition

I asked how to filter the last date and got excellent answers (BigQuery, how to use alias in where clause?), they all work, but, they scan the whole table, the field SETTLEMENTDATE is a partition field, is there a way to scan only one partition

as an example, I am using this query

#standardSQL
SELECT * EXCEPT(isLastDate) 
FROM (
  SELECT *, DATE(SETTLEMENTDATE) = MAX(DATE(SETTLEMENTDATE)) OVER() isLastDate
  FROM `biengine-252003.aemo2.daily`
)
WHERE isLastDate 

edit : please last date is not always current date, as there is lag in the data

Upvotes: 1

Views: 1487

Answers (3)

Elliott Brossard
Elliott Brossard

Reputation: 33745

Now that scripting is in beta in BigQuery, you can declare a variable that contains the target date. Here's an example:

SET max_date DATE DEFAULT (SELECT DATE(MAX(datehour)) FROM `fh-bigquery.wikipedia_v3.pageviews_2019` WHERE wiki='es');

SELECT MAX(views)
FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
WHERE DATE(datehour) = max_date
AND wiki='es'

Upvotes: 3

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

Mikhail's answer looks like this (working on public data):

SELECT MAX(views)
FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
WHERE DATE(datehour) = DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)     
AND wiki='es' 
# 122.2 MB processed

But it seems the question wants something like this:

SELECT MAX(views)
FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
WHERE DATE(datehour) = (SELECT DATE(MAX(datehour)) FROM `fh-bigquery.wikipedia_v3.pageviews_2019` WHERE wiki='es')     
AND wiki='es'
# 50.6 GB processed

... but for way less than 50.6GB

What you need now is some sort of scripting, to perform this in 2 steps:

max_date = (SELECT DATE(MAX(datehour)) FROM `fh-bigquery.wikipedia_v3.pageviews_2019` WHERE wiki='es')   

;

SELECT MAX(views)
FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
WHERE DATE(datehour) = {{max_date}}
AND wiki='es'
# 115.2 MB processed

You will have to script this outside BigQuery - or wait for news on https://issuetracker.google.com/issues/36955074.

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173003

Assuming SETTLEMENTDATE is of DATE data type, you can use below to get today's partition

SELECT *
FROM `biengine-252003.aemo2.daily`
WHERE SETTLEMENTDATE = CURRENT_DATE()     

or, for example for yesterday's partition

SELECT *
FROM `biengine-252003.aemo2.daily`
WHERE SETTLEMENTDATE = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)     

See more at https://cloud.google.com/bigquery/docs/querying-partitioned-tables#querying_partitioned_tables_2

Upvotes: 1

Related Questions