Reputation: 870
Using PySpark, I am updating a MySQL table. The schema has a unique key constraint on multiple three fields.
My Spark job will be running three times a day, since one of the column parts of the unique key is 'date'. I am getting a unique key constraint violation error if I am running a job more than once in a day.
Is there a way from Spark where we can delete the already-existing rows and insert new ones?
I searched the web for the solution, but I could not find any solution.
Upvotes: 0
Views: 1026
Reputation: 18003
Assuming df.writer is being used, there isn't any UPSert mode currently.
Upvotes: 1
Reputation: 2200
You should update the table on the database side. My suggestion is to create a temporary table in the MySQL database and the Spark job inserts data to the temporary table with overwrite mode.
Write a MySQL update script for the table with using a temporary table. And add a job chain after the Spark job to run the MySQL update script.
Upvotes: 1