Big query - Schedule Query on an external partitioned table with the keyword @run_date

Because it is client data I've replace in this post the project name and the dataset name by ******)

I'm trying to create a new schedule query in BigQuery on Google cloud platform The problem is I've got this error in the web Query editor

Cannot query over table '******.raw_bounce_rate' without a filter over column(s) 'dt' that can be used for partition elimination 

The thing is I do filter on the column dt.

Here is the scheme of my external partitioned table

Tracking_Code   STRING  
Pages   STRING  NULLABLE    
Clicks_to_Page  INTEGER 
Path_Lengths    INTEGER 
Visit_Number    INTEGER 
Visitor_ID  STRING  
Mobile_Device_Type  STRING  
All_Visits  INTEGER 
dt  DATE    

dt is the field of the partition and I selected the option "Require partition filter"

Here is the simplify sql of my query

WITH yesterday_raw_bounce_rate AS (
  SELECT *
  FROM `******.raw_bounce_rate`
  WHERE dt = DATE_SUB(@run_date, INTERVAL 1 DAY)
),
entries_table as (
  SELECT dt,
    ifnull(Tracking_Code, "sans campagne") as tracking_code,
    ifnull(Pages, "page non trackée") as pages,
     Visitor_ID,
     Path_Lengths,
     Clicks_to_Page,
    SUM(all_visits) AS somme_visites
  FROM
    yesterday_raw_bounce_rate
  GROUP BY
    dt,
    Tracking_Code,
    Pages,
    Visitor_ID, 
    Path_Lengths,
    Clicks_to_Page
   HAVING
    somme_visites = 1 and Clicks_to_Page = 1
)
select * from entries_table

if I remove the statement

Clicks_to_Page = 1

or if I replace the

DATE_SUB(@run_date, INTERVAL 1 DAY)

by a hard coded date

the query is accepted by Big Query, it does not make sense to me

Upvotes: 1

Views: 573

Answers (1)

Alexandre Moraes
Alexandre Moraes

Reputation: 4042

Currently, there is an open issue, here. It addresses the error regarding using @run_date filter in the filter of scheduled queries to partitioned tables with required filter. The engineering team is currently working on it, although there is no ETA.

In your scheduled query, you can use one of the two workarounds using @run_date.As follows:

First option,

DECLARE runDateVariable DATE DEFAULT @run_date;
#your code...
WHERE date = DATE_SUB(runDateVariable, INTERVAL 1 DAY)

Second option,

DECLARE runDateVariable DATE DEFAULT CAST(@run_date AS DATE);
#your code...
WHERE date = DATE_SUB(runDateVariable, INTERVAL 1 DAY)

In addition, you can also use CURRENT_DATE() instead of @run_date, as shwon below:

DECLARE runDateVariable DATE DEFAULT CURRENT_DATE(); 
#your code...
WHERE date = DATE_SUB(runDateVariable, INTERVAL 1 DAY)

UPDATE

I have set up another scheduled query to run daily with a table partitioned by DATE from a field called date_formatted and the partition filter is required. Then I have set up a backfill, here, so I could see the result of the scheduled query for previous days. Below is the code I used:

DECLARE runDateVariable DATE DEFAULT @run_date;

SELECT @run_date as run_date, date_formatted, fullvisitorId FROM `project_id.dataset.table_name` WHERE date_formatted > DATE_SUB(runDateVariable, INTERVAL 1 DAY) 

Upvotes: 1

Related Questions