Reputation: 51
I use a scheduled query in Big Query which appends data from the previous day to a Big Query table. The data from the previous day is not always available when my query runs, so, to make sure that I have all the data, I need to calculate the last date available in my Big Query table.
My first attempt was to write the following query :
SELECT *
FROM sourceTable
WHERE date >= (SELECT Max(date) from destinationTable)
When I run this query, only date >= max(date) is correctly exported. However, the query processes the entire sourceTable, and not only J - max(date). Therefore, the cost is higher than expected.
I also tried to declare a variable using "DECLARE" & "SET" (https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting). This solution works fine and only J - max(date) is processed. However, BQ interprets a query with "DECLARE" as a script, so the results can't be exported automatically to a BQ table using scheduled queries.
DECLARE maxDate date;
SET maxDate = (SELECT Max(date) from destinationTable);
SELECT *
FROM sourceTable
WHERE date >= maxDate
Is there another way of doing what I would like? Or a way to declare a variable using "DECLARE" & "SET" in a scheduled query with a destination table?
Thanks!
Upvotes: 5
Views: 10121
Reputation: 1181
We still can't use scheduled scripting in BigQuery along with destination table.
But if your source table is date sharded then there is a workaround that can be used to achieve the desired solution (only required data scan based on an initial value from another table).
SELECT
*
FROM
sourceTable*
WHERE
_TABLE_SUFFIX >= (
SELECT
IFNULL(MAX(date),
'<default_date>')
FROM
destinationTable)
This will scan only shards that are greater than or equal to the maximum date of the destination table.
P.S. - source table is date sharded.
Upvotes: -1
Reputation: 5503
Scripting query, when being scheduled, doesn't support setting a destination table for now. You need to use DDL/DML to make change to existing table.
DECLARE maxDate date;
SET maxDate = (SELECT Max(date) from destinationTable);
CREATE OR REPLACE destinationTable AS
SELECT *
FROM sourceTable
WHERE date >= maxDate
Upvotes: 1
Reputation: 1674
Is destinationTable
partitioned? If not, can you recreate it as a partitioned table? If it is a partitioned table, and partitioned on the destinationTable.date
column, you could do something like:
SELECT *
FROM sourceTable
WHERE date >= (SELECT MAX(_PARTITIONTIME) from destinationTable)
Since _PARTITIONTIME
is a pseudo-column, there is no cost in running the subquery.
Upvotes: -1