Ardalan Shahgholi
Ardalan Shahgholi

Reputation: 12555

Merge in Spark SQL - WHEN NOT MATCHED BY SOURCE THEN

I am coding Python and Spark SQL in Databricks and I am using spark 2.4.5.

I have two tables.

Create table IF NOT EXISTS db_xsi_ed_faits_shahgholi_ardalan.Destination
(
  id Int,
  Name string,
  Deleted int
) USING Delta;

Create table IF NOT EXISTS db_xsi_ed_faits_shahgholi_ardalan.Source
(
  id Int,
  Name string,
  Deleted int
) USING Delta;

I need to ran a Merge command between my source and destination. I wrote below command

%sql
MERGE INTO db_xsi_ed_faits_shahgholi_ardalan.Destination AS D
USING db_xsi_ed_faits_shahgholi_ardalan.Source AS S
ON (S.id = D.id)
-- UPDATE
WHEN MATCHED AND S.Name <> D.Name THEN 
  UPDATE SET 
    D.Name = S.Name
-- INSERT    
WHEN NOT MATCHED THEN 
  INSERT (id, Name, Deleted)
  VALUES (S.id, S.Name, S.Deleted)
 -- DELETE
WHEN NOT MATCHED BY SOURCE THEN 
  UPDATE SET 
     D.Deleted = 1

When i ran this command i have below error:

enter image description here

It seems that we do not have NOT MATCHED BY SOURCE in spark! I need a solution to do that.

Upvotes: 6

Views: 6012

Answers (1)

Ardalan Shahgholi
Ardalan Shahgholi

Reputation: 12555

I wrote this code but still i am looking for better approach

%sql
MERGE INTO db_xsi_ed_faits_shahgholi_ardalan.Destination AS D
USING db_xsi_ed_faits_shahgholi_ardalan.Source AS S
ON (S.id = D.id)
-- UPDATE
WHEN MATCHED AND S.Name <> D.Name THEN 
  UPDATE SET 
    D.Name = S.Name
-- INSERT    
WHEN NOT MATCHED THEN 
  INSERT (id, Name, Deleted)
  VALUES (S.id, S.Name, S.Deleted)
;

%sql
-- Logical delete
UPDATE db_xsi_ed_faits_shahgholi_ardalan.Destination
  SET Deleted = 1
WHERE db_xsi_ed_faits_shahgholi_ardalan.Destination.id in
(
  SELECT
    D.id
  FROM db_xsi_ed_faits_shahgholi_ardalan.Destination AS D
  LEFT JOIN db_xsi_ed_faits_shahgholi_ardalan.Source AS S ON (S.id = D.id)
  WHERE S.id is null
) 

Upvotes: 3

Related Questions