Raghav salotra
Raghav salotra

Reputation: 870

Update MySQL rows using SPARK

Using PySpark, I am updating a MySQL table. The schema has a unique key constraint on multiple three fields.

My Spark job will be running three times a day, since one of the column parts of the unique key is 'date'. I am getting a unique key constraint violation error if I am running a job more than once in a day.

Is there a way from Spark where we can delete the already-existing rows and insert new ones?

I searched the web for the solution, but I could not find any solution.

Upvotes: 0

Views: 1026

Answers (2)

Ged
Ged

Reputation: 18003

Assuming df.writer is being used, there isn't any UPSert mode currently.

Upvotes: 1

Ali Yesilli
Ali Yesilli

Reputation: 2200

You should update the table on the database side. My suggestion is to create a temporary table in the MySQL database and the Spark job inserts data to the temporary table with overwrite mode.

Write a MySQL update script for the table with using a temporary table. And add a job chain after the Spark job to run the MySQL update script.

Upvotes: 1

Related Questions