ThibaultC
ThibaultC

Reputation: 51

Declare Big Query Variable with Scheduled Query and Destination Table

I use a scheduled query in Big Query which appends data from the previous day to a Big Query table. The data from the previous day is not always available when my query runs, so, to make sure that I have all the data, I need to calculate the last date available in my Big Query table.

My first attempt was to write the following query :

SELECT *
FROM sourceTable
WHERE date >= (SELECT Max(date) from destinationTable)

When I run this query, only date >= max(date) is correctly exported. However, the query processes the entire sourceTable, and not only J - max(date). Therefore, the cost is higher than expected.

I also tried to declare a variable using "DECLARE" & "SET" (https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting). This solution works fine and only J - max(date) is processed. However, BQ interprets a query with "DECLARE" as a script, so the results can't be exported automatically to a BQ table using scheduled queries.

DECLARE maxDate date;
SET maxDate = (SELECT Max(date) from destinationTable); 
SELECT *
FROM sourceTable
WHERE date >= maxDate

Is there another way of doing what I would like? Or a way to declare a variable using "DECLARE" & "SET" in a scheduled query with a destination table?

Thanks!

Upvotes: 5

Views: 10121

Answers (3)

Vikash Pareek
Vikash Pareek

Reputation: 1181

We still can't use scheduled scripting in BigQuery along with destination table.

But if your source table is date sharded then there is a workaround that can be used to achieve the desired solution (only required data scan based on an initial value from another table).

SELECT
  *
FROM
  sourceTable*
WHERE
  _TABLE_SUFFIX >= (
  SELECT
    IFNULL(MAX(date),
      '<default_date>')
  FROM
    destinationTable)

This will scan only shards that are greater than or equal to the maximum date of the destination table.

P.S. - source table is date sharded.

Upvotes: -1

Yun Zhang
Yun Zhang

Reputation: 5503

Scripting query, when being scheduled, doesn't support setting a destination table for now. You need to use DDL/DML to make change to existing table.

DECLARE maxDate date;

SET maxDate = (SELECT Max(date) from destinationTable); 

CREATE OR REPLACE destinationTable AS
  SELECT *
  FROM sourceTable
  WHERE date >= maxDate

Upvotes: 1

vinoaj
vinoaj

Reputation: 1674

Is destinationTable partitioned? If not, can you recreate it as a partitioned table? If it is a partitioned table, and partitioned on the destinationTable.date column, you could do something like:

SELECT *
FROM sourceTable
WHERE date >= (SELECT MAX(_PARTITIONTIME) from destinationTable)

Since _PARTITIONTIME is a pseudo-column, there is no cost in running the subquery.

Upvotes: -1

Related Questions