Vahid
Vahid

Reputation: 51

Merge in Snowflake- Not matched ,Update and Insert

I have two tables as below.

Source table:

ID EventDate Updated_at metre Active_flag
1004 2022-03-10 2022-03-15 13 Y
1005 2022-03-18 2022-03-18 50 Y
1006 2022-03-15 2022-03-15 10 Y
1007 2022-03-20 2022-03-20 1 Y

Target table:

ID EventDate Updated_at metre Active_flag
1001 2022-01-01 2022-01-01 10 Y
1002 2022-01-02 2022-01-02 15 Y
1003 2022-03-01 2022-03-01 20 Y
1004 2022-03-10 2022-03-10 10 N
1004 2022-03-10 2022-03-15 13 Y
1005 2022-03-18 2022-03-18 5 Y

I need to do the Update and Insert (NOT DELETE) to the Target table for the Date>='2022-03-01'. The comparison should be based on ID and Updated_at. These are the fields coming from API.

Case 1: I need to Update Active_flag in Target Table when the record is not in Source. In this example is, ID= 1003 (as the other records don't meet the Date range filter) should be updated to Active_flage='N'. ID=1004 and Updated_as=2022-03-10 are not matching, and in Target table should have Active_flag='N'(which already has).

Case 2: If they match ID and Updated_at, I can leave them as it is.(Like ID=1004 and updated_at=2022-03-15 Or ID=1005 and Updatyed_at=2022-03-18).

Case 3: If they don't match and the ID is not in Target, I want the records to be INSERTED. Like ID=1006 and ID=1007.

The desired Target table after the Merge should be:

ID EventDate Updated_at metre Active_flag
1001 2022-01-01 2022-01-01 10 Y
1002 2022-01-02 2022-01-02 15 Y
1003 2022-03-01 2022-03-01 20 N
1004 2022-03-10 2022-03-10 10 N
1004 2022-03-10 2022-03-15 13 Y
1005 2022-03-15 2022-03-15 5 Y
1006 2022-03-15 2022-03-15 10 Y
1007 2022-03-20 2022-03-20 1 Y

My question: I could achieve that by using Left and right joins and using two different Tasks in Snowflake, but I just wanted to know if I can achieve that by using MERGE in Snowflake in one Task?

I have tried this:

    merge into "TEST"."PUBLIC"."Target" T
using (
    SELECT b."ID",b."Updated_at",
    iff(a."ID" is null, 'NOT_MATCHED_BY_SOURCE', 'MATCHED_BY_SOURCE') SOURCE_MATCH,
    iff(b."ID" is null, 'NOT_MATCHED_BY_TARGET', 'MATCHED_BY_TARGET') TARGET_MATCH
    from  "TEST"."PUBLIC"."Source" a
    full join "TEST"."PUBLIC"."Target" b
    on a."ID" = b."ID" and a."Updated_at" = b."Updated_at") S
on S."ID" = T."ID" and S."Updated_at"= T."Updated_at"

WHEN matched  and S.SOURCE_MATCH = 'NOT_MATCHED_BY_SOURCE' and T."EventDate">='2022-03-01' then
    UPDATE SET T."Active_flag" = 'N'
WHEN not matched and S.TARGET_MATCH='NOT_MATCHED_BY_TARGET' THEN
    INSERT (T."ID",T."EventDate",T."Updated_at",T."metre",T."Active_flag") Values (a."ID",a."EventDate",a."Updated_at",a."metre",a."Active_flag")
;

The update part works fine as expected but I get an error on the INSERT part of the code as below: SQL compilation error: error line 14 at position 85 invalid identifier 'A.ID'

Thank you in advance for your guidance and help.

Upvotes: 2

Views: 5479

Answers (1)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25903

Data Setup:

CREATE table src_table(d int, date date, updated_at date, metre int, active_flag boolean);
CREATE table trg_table(d int, date date, updated_at date, metre int, active_flag boolean);

INSERT into src_table values 
(1004, '2022-03-10','2022-03-15', 13, true),
(1005,  '2022-03-18','2022-03-18', 50, true),
(1006,  '2022-03-15','2022-03-15', 10, true),
(1007,  '2022-03-20','2022-03-20', 1, true);

INSERT into trg_table values 
(1001,'2022-01-01','2022-01-01',10 ,true ),
(1002,'2022-01-02','2022-01-02',15 ,true ),
(1003,'2022-03-01','2022-03-01',20 ,true ),
(1004,'2022-03-10','2022-03-10',10 ,false),
(1004,'2022-03-10','2022-03-15',13 ,true ),
(1005,'2022-03-15','2022-03-15',5, true  );

core logic:

doing an outer join, we find the matches and non matches, which means we can workout the "stale rows" that need deactivating

SELECT t.d as td, s.*
FROM (SELECT * FROM trg_table where active_flag and date >= '2022-03-01') t
FULL OUTER JOIN src_table s ON t.d = s.d
order by 1;

the core woven into the MERGE which only rule 1 and 3 need work:

MERGE INTO trg_table t using (
    SELECT t.d as td, s.*
    FROM (
        SELECT * 
        FROM trg_table 
        where active_flag and date >= '2022-03-01'
    ) t
    FULL OUTER JOIN src_table s 
        ON t.d = s.d
) s on s.td = t.d 
WHEN MATCHED AND t.date >= '2022-03-01' AND s.d is null then update set active_flag = false
WHEN NOT MATCHED THEN INSERT values (s.d, s.date, s.updated_at, s.metre, s.active_flag);

gives:

D DATE UPDATED_AT METRE ACTIVE_FLAG
1,001 2022-01-01 2022-01-01 10 TRUE
1,002 2022-01-02 2022-01-02 15 TRUE
1,003 2022-03-01 2022-03-01 20 FALSE
1,004 2022-03-10 2022-03-10 10 FALSE
1,004 2022-03-10 2022-03-15 13 TRUE
1,005 2022-03-15 2022-03-15 5 TRUE
1,006 2022-03-15 2022-03-15 10 TRUE
1,007 2022-03-20 2022-03-20 1 TRUE

Upvotes: 2

Related Questions