Reputation: 675
I have a simple Spark job that map, compute and write results into Oracle DB. I'm having a problem when writing the results into DB.
After reducing the results by key, I'm calling foreachPartition action to establish a connect and to write results into DB. If I set the parallelisation to 1, it works fine. BUT when I change the parallelisation of the reducer to 2 or more, it only writes partial results. when I checked the log files, I see this error:
java.sql.BatchUpdateException: ORA-00060: deadlock detected while waiting for resource
How could I resolve this issue?
Upvotes: 1
Views: 390
Reputation: 3608
Oracle is only going to deadlock when you have multiple writes to the same row (or a configuration that doesn't support your level of concurrency but I find this unlikely with 2 parallel writers.)
To get significant benefit from parallelism you need to divide up your work so that your two separate writers don't update the same rows.
This could mean an additional Spark Job to divide up your updates based on the rows they affect before parallelizing the DB writes.
If organizing your writes perfectly to avoid contention is not practical you could add increased parallelization (and get more granularity) and then retry jobs that fail due to deadlocks. This is a band-aid for the real problem which is contention. If you have a high number of deadlocks performance will be much worse with parallelization than it would be just running serially.
Upvotes: 1