Reputation: 51
I have two tables as below.
Source table:
ID | EventDate | Updated_at | metre | Active_flag |
---|---|---|---|---|
1004 | 2022-03-10 | 2022-03-15 | 13 | Y |
1005 | 2022-03-18 | 2022-03-18 | 50 | Y |
1006 | 2022-03-15 | 2022-03-15 | 10 | Y |
1007 | 2022-03-20 | 2022-03-20 | 1 | Y |
Target table:
ID | EventDate | Updated_at | metre | Active_flag |
---|---|---|---|---|
1001 | 2022-01-01 | 2022-01-01 | 10 | Y |
1002 | 2022-01-02 | 2022-01-02 | 15 | Y |
1003 | 2022-03-01 | 2022-03-01 | 20 | Y |
1004 | 2022-03-10 | 2022-03-10 | 10 | N |
1004 | 2022-03-10 | 2022-03-15 | 13 | Y |
1005 | 2022-03-18 | 2022-03-18 | 5 | Y |
I need to do the Update and Insert (NOT DELETE) to the Target table for the Date>='2022-03-01'. The comparison should be based on ID and Updated_at. These are the fields coming from API.
Case 1: I need to Update Active_flag in Target Table when the record is not in Source. In this example is, ID= 1003 (as the other records don't meet the Date range filter) should be updated to Active_flage='N'. ID=1004 and Updated_as=2022-03-10 are not matching, and in Target table should have Active_flag='N'(which already has).
Case 2: If they match ID and Updated_at, I can leave them as it is.(Like ID=1004 and updated_at=2022-03-15 Or ID=1005 and Updatyed_at=2022-03-18).
Case 3: If they don't match and the ID is not in Target, I want the records to be INSERTED. Like ID=1006 and ID=1007.
The desired Target table after the Merge should be:
ID | EventDate | Updated_at | metre | Active_flag |
---|---|---|---|---|
1001 | 2022-01-01 | 2022-01-01 | 10 | Y |
1002 | 2022-01-02 | 2022-01-02 | 15 | Y |
1003 | 2022-03-01 | 2022-03-01 | 20 | N |
1004 | 2022-03-10 | 2022-03-10 | 10 | N |
1004 | 2022-03-10 | 2022-03-15 | 13 | Y |
1005 | 2022-03-15 | 2022-03-15 | 5 | Y |
1006 | 2022-03-15 | 2022-03-15 | 10 | Y |
1007 | 2022-03-20 | 2022-03-20 | 1 | Y |
My question: I could achieve that by using Left and right joins and using two different Tasks in Snowflake, but I just wanted to know if I can achieve that by using MERGE in Snowflake in one Task?
I have tried this:
merge into "TEST"."PUBLIC"."Target" T
using (
SELECT b."ID",b."Updated_at",
iff(a."ID" is null, 'NOT_MATCHED_BY_SOURCE', 'MATCHED_BY_SOURCE') SOURCE_MATCH,
iff(b."ID" is null, 'NOT_MATCHED_BY_TARGET', 'MATCHED_BY_TARGET') TARGET_MATCH
from "TEST"."PUBLIC"."Source" a
full join "TEST"."PUBLIC"."Target" b
on a."ID" = b."ID" and a."Updated_at" = b."Updated_at") S
on S."ID" = T."ID" and S."Updated_at"= T."Updated_at"
WHEN matched and S.SOURCE_MATCH = 'NOT_MATCHED_BY_SOURCE' and T."EventDate">='2022-03-01' then
UPDATE SET T."Active_flag" = 'N'
WHEN not matched and S.TARGET_MATCH='NOT_MATCHED_BY_TARGET' THEN
INSERT (T."ID",T."EventDate",T."Updated_at",T."metre",T."Active_flag") Values (a."ID",a."EventDate",a."Updated_at",a."metre",a."Active_flag")
;
The update part works fine as expected but I get an error on the INSERT part of the code as below: SQL compilation error: error line 14 at position 85 invalid identifier 'A.ID'
Thank you in advance for your guidance and help.
Upvotes: 2
Views: 5479
Reputation: 25903
Data Setup:
CREATE table src_table(d int, date date, updated_at date, metre int, active_flag boolean);
CREATE table trg_table(d int, date date, updated_at date, metre int, active_flag boolean);
INSERT into src_table values
(1004, '2022-03-10','2022-03-15', 13, true),
(1005, '2022-03-18','2022-03-18', 50, true),
(1006, '2022-03-15','2022-03-15', 10, true),
(1007, '2022-03-20','2022-03-20', 1, true);
INSERT into trg_table values
(1001,'2022-01-01','2022-01-01',10 ,true ),
(1002,'2022-01-02','2022-01-02',15 ,true ),
(1003,'2022-03-01','2022-03-01',20 ,true ),
(1004,'2022-03-10','2022-03-10',10 ,false),
(1004,'2022-03-10','2022-03-15',13 ,true ),
(1005,'2022-03-15','2022-03-15',5, true );
core logic:
doing an outer join, we find the matches and non matches, which means we can workout the "stale rows" that need deactivating
SELECT t.d as td, s.*
FROM (SELECT * FROM trg_table where active_flag and date >= '2022-03-01') t
FULL OUTER JOIN src_table s ON t.d = s.d
order by 1;
the core woven into the MERGE which only rule 1 and 3 need work:
MERGE INTO trg_table t using (
SELECT t.d as td, s.*
FROM (
SELECT *
FROM trg_table
where active_flag and date >= '2022-03-01'
) t
FULL OUTER JOIN src_table s
ON t.d = s.d
) s on s.td = t.d
WHEN MATCHED AND t.date >= '2022-03-01' AND s.d is null then update set active_flag = false
WHEN NOT MATCHED THEN INSERT values (s.d, s.date, s.updated_at, s.metre, s.active_flag);
gives:
D | DATE | UPDATED_AT | METRE | ACTIVE_FLAG |
---|---|---|---|---|
1,001 | 2022-01-01 | 2022-01-01 | 10 | TRUE |
1,002 | 2022-01-02 | 2022-01-02 | 15 | TRUE |
1,003 | 2022-03-01 | 2022-03-01 | 20 | FALSE |
1,004 | 2022-03-10 | 2022-03-10 | 10 | FALSE |
1,004 | 2022-03-10 | 2022-03-15 | 13 | TRUE |
1,005 | 2022-03-15 | 2022-03-15 | 5 | TRUE |
1,006 | 2022-03-15 | 2022-03-15 | 10 | TRUE |
1,007 | 2022-03-20 | 2022-03-20 | 1 | TRUE |
Upvotes: 2