user2555515
user2555515

Reputation: 1033

Snowflake MERGE how to keep chronological order

Snowflake has a very usefule construct

MERGE INTO ...  WHEN MATCHED THEN UPDATE ...   WHEN NOT MATCHED THEN INSERT ...

A very commomn situation is when you only need to update when the incoming data is newer and discard it if not. Is there a way to do it in one statement or I would need to write a stored procedure that would read the existing row and decide in the code whether to merge the incoming row.

Upvotes: 1

Views: 877

Answers (2)

Lukasz Szozda
Lukasz Szozda

Reputation: 175726

Assuming we want to achieve SCD 1 - overwrite then we could use MERGE like:

MERGE INTO tab trg
USING src_tab src
  ON trg.business_key = src.business_key
WHEN NOT MATCHED
    THEN INSERT(business_key, col1, col2) 
         VALUES (src.business_key, src.col1, src.col2) 
WHEN MATCHED 
 -- if nothing has change for non-key attribute, do not update
 AND HASH(trg.col1,trg.col2) != HASH(src.col1,src.col2) 
     THEN UPDATE SET trg.col1 = src.col1
                    ,trg.col2 = src.col2
                    ,trg.sys_load_date = CURRENT_TIMESTAMP();

Sample scenario:

CREATE OR REPLACE TABLE tab(sys_key INT IDENTITY(1,1),
                            business_key INT NOT NULL,
                            col1 STRING,
                            col2 STRING,
                            sys_load_date DATETIME DEFAULT CURRENT_TIMESTAMP());
                            
CREATE OR REPLACE TABLE src_tab(business_key INT, col1 STRING, col2 STRING);

Sample run:

-- initial data                    
INSERT OVERWRITE INTO src_tab VALUES (10, NULL, NULL), (20, 'a','a');
--MERGE
SELECT * FROM tab ORDER BY sys_key;
/*
SYS_KEY BUSINESS_KEY    COL1    COL2    SYS_LOAD_DATE
1   10          2021-05-11 11:34:31.337
2   20  a   a   2021-05-11 11:34:31.337
*/

-- first row does not change, new row
INSERT OVERWRITE INTO src_tab VALUES (10, NULL, NULL), (30, 'b','b');
--MERGE
SELECT * FROM tab ORDER BY sys_key;
/*
SYS_KEY BUSINESS_KEY    COL1    COL2    SYS_LOAD_DATE
1   10          2021-05-11 11:34:31.337
2   20  a   a   2021-05-11 11:34:31.337
4   30  b   b   2021-05-11 11:34:47.298
*/

-- change row 1
INSERT OVERWRITE INTO src_tab VALUES (10, 'x', NULL);
-- MERGE
SELECT * FROM tab ORDER BY sys_key;
/*
SYS_KEY BUSINESS_KEY    COL1    COL2    SYS_LOAD_DATE
1   10  x       2021-05-11 11:35:32.122
2   20  a   a   2021-05-11 11:34:31.337
4   30  b   b   2021-05-11 11:34:47.298
*/

Upvotes: 2

NickW
NickW

Reputation: 9788

You can add additional logic to the WHEN MATCHED clause using an AND clause:

WHEN MATCHED [ AND <case_predicate> ] THEN ...

Upvotes: 1

Related Questions