diplomaticguru
diplomaticguru

Reputation: 675

Why establishing DB connection within foreachPartition and parallelizing it causing "ORA-00060: deadlock"?

I have a simple Spark job that map, compute and write results into Oracle DB. I'm having a problem when writing the results into DB.

After reducing the results by key, I'm calling foreachPartition action to establish a connect and to write results into DB. If I set the parallelisation to 1, it works fine. BUT when I change the parallelisation of the reducer to 2 or more, it only writes partial results. when I checked the log files, I see this error:

java.sql.BatchUpdateException: ORA-00060: deadlock detected while waiting for resource

How could I resolve this issue?

Upvotes: 1

Views: 390

Answers (1)

Michael
Michael

Reputation: 3608

Oracle is only going to deadlock when you have multiple writes to the same row (or a configuration that doesn't support your level of concurrency but I find this unlikely with 2 parallel writers.)

To get significant benefit from parallelism you need to divide up your work so that your two separate writers don't update the same rows.

This could mean an additional Spark Job to divide up your updates based on the rows they affect before parallelizing the DB writes.

If organizing your writes perfectly to avoid contention is not practical you could add increased parallelization (and get more granularity) and then retry jobs that fail due to deadlocks. This is a band-aid for the real problem which is contention. If you have a high number of deadlocks performance will be much worse with parallelization than it would be just running serially.

Upvotes: 1

Related Questions