Mike Davis
Mike Davis

Reputation: 31

BigQuery Merge - Size of Query expands with DELETE clause

When attempting a MERGE statement, BigQuery is only scanning the requested partitions UNTIL the DELETE statment is added, at which point it reverts to scanning the whole dataset (blossoming from 1GB to >1TB in this case).

Is there a way to use the full features of MERGE, including DELETE, without incurring the extra cost?

Generic sample that matches my effort below:

MERGE target_table AS t *## All Dates, partitioned on 
activity_date*

USING source_table AS s ## one date, only yesterday

ON t.field_a = s.field_a

AND t.activity_date >= 
DATE_ADD(DATE(current_timestamp(),'America/Los_Angeles'), INTERVAL -1 DAY) ## use partition to limit to yesterday

WHEN MATCHED
THEN UPDATE SET
field_b = s.field_b

WHEN NOT MATCHED
THEN INSERT
(field_a, field_b)
VALUES
(field_a, field_b)

WHEN NOT MATCHED BY SOURCE
THEN DELETE

Upvotes: 3

Views: 311

Answers (1)

Gurkomal
Gurkomal

Reputation: 166

Based on the query you have provided, it is not expected behavior for it to apply the merge on the whole dataset. After the query has run, you should analyze your dataset and check its validity to ensure that the query only ran on the specific partitions.

If, after further inspection, no unexpected changes were made to your dataset, the 1 TB of data noted may be simply explained as BigQuery ingesting that data into memory as a side step to be able to run the query.

However, to confirm it is recommended to submit a ticket in the issue tracker with your BigQuery JobID so that BigQuery engineering can properly inspect the issue.

Upvotes: 3

Related Questions