Compare two Dataframe and run "Update Else Insert" in Pyspark

Question

I am new to Python and couldn't find the exact answers that i am looking for in other blogs and hence posting it as a new question.

I have two dataframes created as below

df_hive.show() 
+--------+----------+-------+ 
|BATCH_ID|SRC_SYS_ID|ACT_IND| 
+--------+----------+-------+ 
|     100|      SYS1|      N| 
|     101|      SYS2|      N| 
|     102|      SYS3|      N| 
|     103|      SYS4|      Y| 
+--------+----------+-------+

df_orc.show() 
+--------+----------+-------+ 
|BATCH_ID|SRC_SYS_ID|ACT_IND| 
+--------+----------+-------+ 
|      99|      SYS0|      N|
|     100|      SYS1|      N| 
|     101|      SYS2|      N| 
|     102|      SYS3|      Y| 
+--------+----------+-------+

Expected Results for "df_orc"

+--------+----------+-------+ 
|BATCH_ID|SRC_SYS_ID|ACT_IND| 
+--------+----------+-------+ 
|      99|      SYS0|      N|
|     100|      SYS1|      N| 
|     101|      SYS2|      N| 
|     102|      SYS3|      N| 
|     103|      SYS4|      Y| 
+--------+----------+-------+

what I am trying to achieve is join two dataframe based on BATCH_ID and SRC_SYS_ID and if matched (and ACT_IND is different), update the ACT_IND in the second dataframe "df_orc" and if no match found, then insert it as new record into "df_orc".

Appreciate your help in Advance

Compare two Dataframe and run "Update Else Insert" in Pyspark

Answers (1)

Related Questions

Compare two Dataframe and run &quot;Update Else Insert&quot; in Pyspark

Answers (1)

Related Questions

Compare two Dataframe and run "Update Else Insert" in Pyspark