Lho Ben
Lho Ben

Reputation: 2149

Camunda : How to locate the step in my workflow that provoke OptimisticLockingException

Under heavy load we are experiencing a lot of OptimisticLockingException exceptions and job reties for some of our processes (which causes a lot of trouble).

When not under load, the orchestrator don't throw any OptimisticLockingException exception

Could you please suggest a way to locate which steps provoke these concurrent operations ?

170556:2021/01/21 21:35:04.022 DEBUG ENGINE-16002 Exception while closing command context: ENGINE-03005 Execution of 'UPDATE ExecutionEntity[223d44fe-5c28-11eb-aa7e-eeeccf665d52]' failed. Entity was updated by another transaction concurrently. {"org.camunda.bpm.engine.OptimisticLockingException: ENGINE-03005 Execution of 'UPDATE ExecutionEntity[223d44fe-5c28-11eb-aa7e-eeeccf665d52]' failed. Entity was updated by another transaction concurrently.":null}

170986:2021/01/21 21:35:04.107 WARN ENGINE-14006 Exception while executing job 23e3a29c-5c28-11eb-80a2-eeeccf665d52:  {"org.camunda.bpm.engine.OptimisticLockingException: ENGINE-03005 Execution of 'UPDATE ExecutionEntity[223d44fe-5c28-11eb-aa7e-eeeccf665d52]' failed. Entity was updated by another transaction concurrently.":null}

107264:2021/01/21 21:35:36.407 DEBUG ENGINE-16002 Exception while closing command context: ENGINE-03005 Execution of 'DELETE TimerEntity[f723f288-5c27-11eb-aa7e-eeeccf665d52]' failed. Entity was updated by another transaction concurrently. {"org.camunda.bpm.engine.OptimisticLockingException: ENGINE-03005 Execution of 'DELETE TimerEntity[f723f288-5c27-11eb-aa7e-eeeccf665d52]' failed. Entity was updated by another transaction concurrently.":null}

If you can suggest a way to avoir retry of async task that would be great, as asked in this question https://forum.camunda.org/t/how-to-avoid-retry-of-async-service-tasks-when-an-optimisticlockingexception-occurs/21301

Env : 2 instances of spring boot Camunda orchestrator

<camunda-bpm.version>3.4.0</camunda-bpm.version>
<camunda-engine.version>7.12.0</camunda-engine.version>

Postgres 9.12 with read_commited

Upvotes: 1

Views: 4119

Answers (1)

rob2universe
rob2universe

Reputation: 7583

OptimisticLockingExceptions are a mechanism to protect you from lost updates, which could otherwise result form concurrent access to the same execution data. One transaction updated the parent execution first (V1>V2). The process engine then makes the second transaction redo its operations (on V1, meanwhile stale), but this time based on the latest version of the execution (V2). The second transaction then creates new version of the execution (V2>V3)

So the OLEs can occur in places where concurrency occurs. Are you using parallel or inclusive gateways? Are events trigger concurrent token flow?

Understand when concurrency occurs in the process model / engine and evaluate if the concurrent execution is really needed. In many cases people model e.g. two service call in parallel, which only take milliseconds each. Then there is no gain in total processing time (creating and merging concurrent job also costs time), but the concurrency can become a burden. So prefer sequential execution where possible.

Check the duration of your transactions. If you have longer transaction combining multiple service calls, it can be helpful to split them into multiple jobs (it depends on the use case. more jobs also mean more transactions).

The most important best practice when dealing with OLE is checking async before on merging parallel gateways. This will not fully prevent the OLE, but the built-in retry mechanism of the job executor will take care of them for you.

Last but not least OLEs occur increasingly when the system is high load and the DB is not performing well. Tune the overall system performance to reduce DB load and OLEs.

Upvotes: 1

Related Questions