Is MERGE syntax in BigQuery are scan entire tables?

I using MERGE syntax to entire table using two table, the source and the target. The task is to update old data in target and add new data from source in selected time (there's modified_time scan ). The problem is, i don't want the merge to scan entire rows and only focused on what i want to merge.

So I try to used this query:

  target trg
USING
  source src
ON
  trg.id = src.id
  AND trg.c_ctr = src.c_ctr
  AND DATE(trg.pt) = DATE(src.pt)
  AND DATE(trg.pt) >= "2019-12-12
WHEN 
  MATCHED AND DATE(src.pt) >= "2019-12-12 THEN 
UPDATE
  SET (...)
WHEN 
  NOT MATCHED AND DATE(src.pt) >= "2019-12-12" THEN
INSERT
  (...) VALUES (...)

When I try this query... it seems consume "pretty low" data (approx. 140MB) due i want to check from 25 days ago. I haven't try this query in big size tables. That something that i want to ask is, if i have give some condition to barrier the scan, is MERGE operation keep scan the entire table or just several rows covered by the partition? Any help are appreciated.

Upvotes: 1

Views: 3691

Answers (2)

Matt Kocak
Matt Kocak

Reputation: 808

A MERGE statement will indeed scan the entire target and/or source table if no filters are included.

In your case, I'm assuming that your AND DATE(trg.pt) >= "2019-12-12" conditions are intending to take advantage of the target table's partitioning, which should work. Although, it looks like you use it in both the search_condition and the merge_condition, which is unnecessary. Using it in either one of those should do.

Google's Updating partitioned table data using DML article talks about taking advantage of partitioning in your MERGE statements and gives several examples as well. Quoting from that article:

When you run a MERGE statement against a partitioned table, you can limit which partitions are scanned by including the partitioning column in either a subquery filter, a search_condition filter, or a merge_condition filter. Pruning can occur when scanning the source table or the target table, or both.

Upvotes: 0

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

Checking the documentation:

DML pricing for partitioned tables

If there are only INSERT clauses in the MERGE statement, you are charged for the sum of bytes processed for all the columns referenced in all partitions scanned by the query.

If there is an UPDATE or DELETE clause in the MERGE statement, you are charged for the sum of the bytes processed for all the columns referenced in all partitions for the source tables scanned by the query + the sum of bytes for all columns in the updated, deleted or scanned partitions for the target table (at the time the MERGE starts).

The key part is for all the columns referenced in all partitions for the source tables scanned by the query. So if your filters scan only a few days of data, those filters will make sure your query will only be charged for those days.

Upvotes: 1

Related Questions