Reputation: 97
There is a requirement to update only changed rows in an existing table compared to the created dataframe. So rightnow , i do subtract and get the changed rows, but not sure how to merge into existing table.
old_df = spark.sql("select * from existing table")
diff = new_df.subtract(old_df)
diff dataframe has to be now inserted(if new rows) or update existing records
(deltaTable.alias("full_df").merge(
merge_df.alias("append_df"),
"full_df.col1 = append_df.col1 OR full_df.col2 =append_df.col2")
.whenNotMatchedInsertAll()
.execute()
)
This is not updating existing records (case: col2 value changed ; col1 not changed)
Upvotes: 3
Views: 10928
Reputation: 1387
.whenMatchedUpdateAll()
accepts a condition which you can use to preserve unchanged rows:
(
deltaTable
.alias("full_df")
.merge(
merge_df.alias("append_df"),
"full_df.col1 = append_df.col1 OR full_df.col2 = append_df.col2")
.whenNotMatchedInsertAll()
.whenMatchedUpdateAll("full_df.col1 != append_df.col1 OR full_df.col2 != append_df.col2")
.execute()
)
Upvotes: 7