Reputation: 1423
I'm trying to write a Merge query in Google BigQuery (part of an ETL process).
I have Source (staging) and Target tables and I have 2 ways of merge the data: the classic 'Upsert' Merge OR Insert new row if not matched all columns.
This is an example of the first way (the classic 'Upsert') query:
MERGE DS.Target T
USING DS.Source S
ON T.Key=S.Key
WHEN NOT MATCHED THEN
INSERT ROW
WHEN MATCHED THEN
UPDATE SET Col1 = S.Col1, Col2 = S.Col2
in that way if the key exist it always updates the values of the cols even if value are the same. also this will work only if the key is not Nullable.
The other way of doing it is to inserting new row when values not matched:
MERGE DS.Target T
USING DS.Source S
ON T.A = S.A and T.B = S.B and T.C = S.C
WHEN NOT MATCHED THEN
INSERT ROW
I prefer this way, BUT I found that its not possible when column type is NULL, because NULL != NULL and then the condition is false when values are Null.
I can't find a proper way of writing this query and handle Nulls comparison.
It's not possible to check for Nulls at the merge condition, Ex:
ON ((T.A IS NULL and S.A IS NULL) or T.A = S.A)
WHEN NOT MATCHED THEN
INSERT ROW
Error message:
RIGHT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
It's not possible also to use the Target table reference at the WHERE
clause, Ex:
ON T.A = S.A
WHEN NOT MATCHED AND
S.A IS NOT NULL AND T.A IS NOT NULL
THEN
INSERT ROW
What do you suggest? Also, lets say both ways are possible, what would be more cost effective by BQ? I guess the performance should be the same. I also assume that I can ignore the insertions cost. Thanks!
Upvotes: 8
Views: 12676
Reputation: 59225
Can you use a "magic" number or id?
This works:
CREATE OR REPLACE TABLE temp.target AS
SELECT * FROM UNNEST(
[STRUCT(1 AS A, 2 AS B, 3 AS C, 5 AS d)
, (null, 1, 3, 500)
]);
CREATE OR REPLACE TABLE temp.source AS
SELECT * FROM UNNEST(
[STRUCT(1 AS A, 2 AS B, 3 AS C, 100 AS d)
, (1, 1, 1, 1000)
, (null, null, null, 10000)
, (null, 1, 3, 10000)
]);
MERGE temp.target T
USING temp.source S
ON IFNULL(T.A, -9999999) = IFNULL(S.A, -9999999) and IFNULL(T.B, -9999999) = IFNULL(S.B, -9999999) and IFNULL(T.C, -9999999) = IFNULL(S.C, -9999999)
WHEN NOT MATCHED THEN
INSERT ROW;
Upvotes: 3