Reputation: 1033
Snowflake has a very usefule construct
MERGE INTO ... WHEN MATCHED THEN UPDATE ... WHEN NOT MATCHED THEN INSERT ...
A very commomn situation is when you only need to update when the incoming data is newer and discard it if not. Is there a way to do it in one statement or I would need to write a stored procedure that would read the existing row and decide in the code whether to merge the incoming row.
Upvotes: 1
Views: 877
Reputation: 175726
Assuming we want to achieve SCD 1 - overwrite then we could use MERGE
like:
MERGE INTO tab trg
USING src_tab src
ON trg.business_key = src.business_key
WHEN NOT MATCHED
THEN INSERT(business_key, col1, col2)
VALUES (src.business_key, src.col1, src.col2)
WHEN MATCHED
-- if nothing has change for non-key attribute, do not update
AND HASH(trg.col1,trg.col2) != HASH(src.col1,src.col2)
THEN UPDATE SET trg.col1 = src.col1
,trg.col2 = src.col2
,trg.sys_load_date = CURRENT_TIMESTAMP();
Sample scenario:
CREATE OR REPLACE TABLE tab(sys_key INT IDENTITY(1,1),
business_key INT NOT NULL,
col1 STRING,
col2 STRING,
sys_load_date DATETIME DEFAULT CURRENT_TIMESTAMP());
CREATE OR REPLACE TABLE src_tab(business_key INT, col1 STRING, col2 STRING);
Sample run:
-- initial data
INSERT OVERWRITE INTO src_tab VALUES (10, NULL, NULL), (20, 'a','a');
--MERGE
SELECT * FROM tab ORDER BY sys_key;
/*
SYS_KEY BUSINESS_KEY COL1 COL2 SYS_LOAD_DATE
1 10 2021-05-11 11:34:31.337
2 20 a a 2021-05-11 11:34:31.337
*/
-- first row does not change, new row
INSERT OVERWRITE INTO src_tab VALUES (10, NULL, NULL), (30, 'b','b');
--MERGE
SELECT * FROM tab ORDER BY sys_key;
/*
SYS_KEY BUSINESS_KEY COL1 COL2 SYS_LOAD_DATE
1 10 2021-05-11 11:34:31.337
2 20 a a 2021-05-11 11:34:31.337
4 30 b b 2021-05-11 11:34:47.298
*/
-- change row 1
INSERT OVERWRITE INTO src_tab VALUES (10, 'x', NULL);
-- MERGE
SELECT * FROM tab ORDER BY sys_key;
/*
SYS_KEY BUSINESS_KEY COL1 COL2 SYS_LOAD_DATE
1 10 x 2021-05-11 11:35:32.122
2 20 a a 2021-05-11 11:34:31.337
4 30 b b 2021-05-11 11:34:47.298
*/
Upvotes: 2
Reputation: 9788
You can add additional logic to the WHEN MATCHED clause using an AND clause:
WHEN MATCHED [ AND <case_predicate> ] THEN ...
Upvotes: 1