Guy P
Guy P

Reputation: 1423

BigQuery Merge - Insert new rows if not mached

I'm trying to write a Merge query in Google BigQuery (part of an ETL process).

I have Source (staging) and Target tables and I have 2 ways of merge the data: the classic 'Upsert' Merge OR Insert new row if not matched all columns.

This is an example of the first way (the classic 'Upsert') query:

MERGE DS.Target T
USING DS.Source S
ON T.Key=S.Key    
WHEN NOT MATCHED THEN
  INSERT ROW
WHEN MATCHED THEN
  UPDATE SET Col1 = S.Col1, Col2 = S.Col2

in that way if the key exist it always updates the values of the cols even if value are the same. also this will work only if the key is not Nullable.

The other way of doing it is to inserting new row when values not matched:

MERGE DS.Target T
USING DS.Source S
ON T.A = S.A and T.B = S.B and T.C = S.C
WHEN NOT MATCHED THEN
INSERT ROW

I prefer this way, BUT I found that its not possible when column type is NULL, because NULL != NULL and then the condition is false when values are Null.

I can't find a proper way of writing this query and handle Nulls comparison.

It's not possible to check for Nulls at the merge condition, Ex:

ON ((T.A IS NULL and S.A IS NULL) or T.A = S.A)
WHEN NOT MATCHED THEN
INSERT ROW

Error message:

RIGHT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

It's not possible also to use the Target table reference at the WHERE clause, Ex:

ON T.A = S.A
WHEN NOT MATCHED AND 
   S.A IS NOT NULL AND T.A IS NOT NULL
THEN
   INSERT ROW

What do you suggest? Also, lets say both ways are possible, what would be more cost effective by BQ? I guess the performance should be the same. I also assume that I can ignore the insertions cost. Thanks!

Upvotes: 8

Views: 12676

Answers (1)

Felipe Hoffa
Felipe Hoffa

Reputation: 59225

Can you use a "magic" number or id?

This works:

CREATE OR REPLACE TABLE temp.target AS
SELECT * FROM UNNEST(
  [STRUCT(1 AS A, 2 AS B, 3 AS C, 5 AS d)
   , (null, 1, 3, 500)
  ]);
CREATE OR REPLACE TABLE temp.source AS
SELECT * FROM UNNEST(
  [STRUCT(1 AS A, 2 AS B, 3 AS C, 100 AS d)
   , (1, 1, 1, 1000)
   , (null, null, null, 10000)
   , (null, 1, 3, 10000)
  ]);

enter image description here

enter image description here

MERGE temp.target T
USING temp.source S
ON IFNULL(T.A, -9999999) = IFNULL(S.A, -9999999) and IFNULL(T.B, -9999999) = IFNULL(S.B, -9999999) and IFNULL(T.C, -9999999) = IFNULL(S.C, -9999999)
WHEN NOT MATCHED THEN
INSERT ROW;

enter image description here

Upvotes: 3

Related Questions