Update MySQL rows using SPARK

Question

Using PySpark, I am updating a MySQL table. The schema has a unique key constraint on multiple three fields.

My Spark job will be running three times a day, since one of the column parts of the unique key is 'date'. I am getting a unique key constraint violation error if I am running a job more than once in a day.

Is there a way from Spark where we can delete the already-existing rows and insert new ones?

I searched the web for the solution, but I could not find any solution.

Ali Yesilli · Accepted Answer

You should update the table on the database side. My suggestion is to create a temporary table in the MySQL database and the Spark job inserts data to the temporary table with overwrite mode.

Write a MySQL update script for the table with using a temporary table. And add a job chain after the Spark job to run the MySQL update script.

Update MySQL rows using SPARK

Answers (2)

Related Questions