Reputation: 73
I am trying to create a Snowflake task that executes a MERGE statement.
However, it seems that Snowflake does not recognize the “when not matched by target” or “when not matched by source” statements.
create or replace task MERGE_TEAM_TOUCHPOINT
warehouse = COMPUTE_WH
schedule = '1 minute'
when system$stream_has_data('TEAMTOUCHPOINT_CDC')
as
merge into dv.Team_Touchpoint as f using TeamTouchpoint_CDC as s
on s.uniqueid = f.uniqueid
when matched then
update set TEAMUNIQUEID = s.TEAMUNIQUEID,
TOUCHPOINTUNIQUEID = s.TOUCHPOINTUNIQUEID
when not matched by target then
insert (
ID,
UniqueID,
TEAMUNIQUEID,
TOUCHPOINTUNIQUEID
)
values (
s.ID,
s.UniqueID,
s.TEAMUNIQUEID,
s.TOUCHPOINTUNIQUEID
)
when not matched by source then delete;
How can I do this? Is there really no other way than creating a stored procedure in javascript to first truncate the table and then insert everything from the staging table?
Upvotes: 7
Views: 6061
Reputation: 59305
A workaround suggested by a teammate:
MATCHED_BY_SOURCE
based on a full join, then look if a.col or b.col are null:merge into TARGET t
using (
select <COLUMN_LIST>,
iff(a.COL is null, 'NOT_MATCHED_BY_SOURCE', 'MATCHED_BY_SOURCE') SOURCE_MATCH,
iff(b.COL is null, 'NOT_MATCHED_BY_TARGET', 'MATCHED_BY_TARGET') TARGET_MATCH
from SOURCE a
full join TARGET b
on a.COL = b.COL
) s
on s.COL = t.COL
when matched and s.SOURCE_MATCH = 'NOT_MATCHED_BY_SOURCE' then
<DO_SOMETHING>
when matched and s.TARGET_MATCH = 'NOT_MATCHED_BY_TARGET' then
<DO_SOMETHING_ELSE>
;
(same as in https://stackoverflow.com/a/69095225/132438)
Upvotes: 7
Reputation: 590
Neither 'by target' nor 'by source' are valid keywords within the MERGE command of Snowflake and the Matching is meant to be 'by target' only (https://docs.snowflake.com/en/sql-reference/sql/merge.html). To achieve your goal you need to run the DELETE separately from the MERGE - in which you will be able to run the UPDATE (when MATCHED) and the INSERT (when NOT MATCHED "by target"), as in fact the DELETE can be handled by the MERGE only WHEN MATCHED "by target".
You could handle the two steps (1.DELETE; 2.MERGE-UPDATE&INSERT) within a single explicit transaction in a Stored Procedure, or two different transactions via two separate Tasks, one of which being an AFTER Task.
Alternatively, you can run an INSERT with the optional parameter OVERWRITE which will run a TRUNCATE of the target table and a subsequent loading from the source table, all in a single transaction: https://docs.snowflake.com/en/sql-reference/sql/insert.html#optional-parameters
Here is a reproducible example of the DELETE + MERGE(UPDATE&INSERT) approach:
USE DEV;
CREATE OR REPLACE TEMPORARY TABLE Public.My_Merge_Target (
Id INTEGER, Name VARCHAR
)
AS
SELECT column1, column2
FROM (VALUES (1, 'Stay as is'), (2, 'This name has to change'), (3, 'This needs to go'));
CREATE OR REPLACE TEMPORARY TABLE Public.My_Merge_Source (
Id INTEGER, Name VARCHAR
)
AS
SELECT column1, column2
FROM (VALUES (1, 'Stay as is'), (2, 'This is the new name for id=2'), (4, 'A new row'));
SELECT * FROM Public.My_Merge_Target ORDER BY Id;
/*
------------------------------------
Id | Name
------------------------------------
1 | Stay as is
2 | This name has to change
3 | This needs to go
*/
SELECT * FROM Public.My_Merge_Source ORDER BY Id;
/*
------------------------------------
Id | Name
------------------------------------
1 | Stay as is
2 | This is the new name for id=2
4 | A new row
*/
DELETE FROM Public.My_Merge_Target AS trg
USING (
SELECT t.Id FROM Public.My_Merge_Source AS s
RIGHT JOIN Public.My_Merge_Target AS t
ON s.Id = t.Id
WHERE s.Id IS NULL
) AS src
WHERE trg.Id = src.Id;
/*
-----------------------
number of rows deleted
-----------------------
1
-----------------------
*/
SELECT * FROM Public.My_Merge_Target ORDER BY Id;
/*
------------------------------------
Id | Name
------------------------------------
1 | Stay as is
2 | This is the new name
*/
MERGE
INTO Public.My_Merge_Target AS trg
USING (
SELECT Id, Name
FROM Public.My_Merge_Source
) AS src
ON
trg.Id = src.Id
WHEN
MATCHED
AND (src.Name != trg.Name) THEN UPDATE
SET Name = src.Name
WHEN
NOT MATCHED THEN INSERT (Id, Name)
VALUES (src.Id, src.Name)
;
/*
-------------------------------------------------
number of rows inserted | number of rows updated
-------------------------------------------------
1 | 1
-------------------------------------------------
*/
SELECT * FROM Public.My_Merge_Target ORDER BY Id;
/*
------------------------------------
Id | Name
------------------------------------
1 | Stay as is
2 | This is the new name for id=2
4 | A new row
*/
Upvotes: 4