BigQuery MERGE statement billing more bytes than editor shows

Question

I have a very large (3.5B records) table that I want to update/insert (upsert) using the MERGE statement in BigQuery. The source table is a staging table that contains only the new data, and I need to check if the record with a corresponding ID is in the target table, updating the row if so or inserting if not.

The target table is partitioned by an integer field called IdParent, and the matching is done on IdParent and another integer field called IdChild. My merge statement/script looks like this:

declare parentList array;

set parentList = array(select distinct IdParent from dataset.Staging);

merge into dataset.Target t
using dataset.Staging s
on
  -- target is partitioned by IdParent, do this for partition pruning
  t.IdParent in unnest(parentList)
  and t.IdParent = s.IdParent
  and t.IdChild = s.IdChild
when matched and t.IdParent in unnest(parentList) then
  update
    set t.Column1 = s.Column1,
    t.Column2 = s.Column2,
    ...
when not matched and IdParent in unnest(parentList) then
  insert ()
  values (


So I:

Pull the IdParent list from the staging table to know which partitions to prune

limit the partitions of the target table in the join predicate

also limit the partitions of the target table in the match/not matched conditions


The total size of dataset.Target is ~250GB. If I put this script in my BQ editor and remove all the IdParent in unnest(parentList) then it shows ~250GB to bill in the editor (as expected since there's no partition pruning). If I add the IdParent in unnest(parentList) back in so the script is exactly like you see it above i.e. attempting to partition prune, the editor shows ~97MB to bill. However, when I look at the query results, I see that it actually billed ~180GB:

The target table is also clustered on the two fields being matched, and I'm aware that the benefits of clustering are typically not shown in the editor's estimate. However, my understanding is that that should only make the bytes billed smaller... I can't think of any reason why this would happen.
Is this a BQ bug, or am I just missing something? BigQuery doesn't even say "the script is estimated to process XX MB", it says "This will process XX MB" and then it processes way more.

BigQuery MERGE statement billing more bytes than editor shows

Answers (1)

Related Questions