Vaibhav B
Vaibhav B

Reputation: 15

Delta Table Merge Operation logs Output is not correct number of updated records?

I am performing merge operation on my delta table in spark. I have existing delta table , it already has some records. Now I created another dataframe of csv file, and added one new record and updated one records in that. Please check below snip.

(df_source) is the updated table(temp view)

Now after performing merge operation. The logs generated here are not correct in updated records it shows 3 records updated i have updated only one record. for inserted it shows correctly i have issue with update why it is updating all the records.

Can you please help me to understand what's happening behind the scenes.

delta table
delta table

UpdatedSourceFile
UpdatedSourceFile

MergeStatment
MergeStatment

Upvotes: 0

Views: 1446

Answers (1)

Kartik Bhiwapurkar
Kartik Bhiwapurkar

Reputation: 5165

As per your Merge statement, you are updating the records if IDs in both tables are same. You are getting correct output as, everytime merge statement found the same id in target table as source table since it is updating that record and because of this, you are getting 3 records updated.

As per official documentation, such an update action is considered ambiguous by the SQL semantics of merge since it is not apparent which source record should be utilized to update the matched destination row.

For your reference, kindly find the below documentation link: -

https://learn.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge-into

Upvotes: 0

Related Questions