The Lost Update - Java, Spring and JPA

Question

I'm having an issue at work, that tried for months to solve it and it's driving me nuts.

The thing is hard to explain, it involves some particularities of the domain that I'm not allowed to discuss, and I can't copy-paste the exact code. I'll try to make myself as clear as I can with some representative examples.

Briefly speaking, the system consists in an root entity, let's call it MainDocument entity. Around this entity, there are several entities orbiting. The MainDocument entity has a State. Let's call this state "MainDocumentState".

public class MainDocument {
   @OneToOne
   @JoinColumn(name = "document_state_id")
   MainDocumentState state;
   @Version
   long version = 0L;
}

There are around 10 states available, but on this example will focus on two of them. Lets call them, ReadyForAuthorization and Authorized.

That's all you need to know for the example.

About the technologies that we are using:

Spring
GWT Webapp
Java 1.6
Hibernate
JPA
Oracle DB.

About the issue itself:

There is a section of the system that is critical, and handles most of the incoming traffic. Let's call this section "the authorization section". On this section, we send information via a SOAP WS provided by our country's Customs and Border Protection, to authorize the MainDocument against Customs.

The code looks like this:

@Transactional
public void authorize(Integer mainDocId) {

  MainDocument mainDocument = mainDocumentService.findById(mainDocId);
  // if document is not found, an exception is thrown.
  Assert.isTrue(mainDocument.notAutorized(), "The document is       already authorized");
  // more business logic validations happen here. This validations are not important for the topic discussed here. They make sure that the document meets some basic preconditions.

  try {
 
   Transaction aTransaction = transactionService.newTransaction(); // creates a transaction, an entity stored in the database that keeps track of all the authorization service calls
   try {
    Response response = wsAuthroizationService.sendAuthorization(mainDocument.getId(), mainDocument.getAuthorizationId()); // take into account that sometimes this call can take between 2-4 minutes. 
    catch (Exception e) {
     aTransaction.failed();
     transactionService.saveOrUpdate(aTransaction);
     throw e;
    }
    // the behaviour is the same for every error code.
    if (response.getCode() != 0) {
     aTransaction.setErrorCode(resposne.getCode());
     transactionService.saveOrUpdate(aTransaction);
     throw AuthroizationError("Error on auth");
    }
    aTransaction.completed();
    mainDocument.setAuthorizationCode(0);
    mainDocument.authorize(); // will change state to "Authorized"
   } catch (Exception e) {
    mainDocument.authorize(); // will not change state because   authorizationCode != 0 or its null.
   } finally {
    saveOrUpdate(mainDocument);
   }
  }

When does the lost update happen and how it affects the system:

MainDocument id: 1@Thread-1 tries to authorize
The document is not authorized, the execution continues
Goes through the webservice and authorizes OK
Transaction closes and commit happens.
While 1 is commiting, MainDocument 1@Thread-2 comes in, and tries to to auth.
1 is not persisted yet, Thread-2 tries to auth.
Thread-2 is rejected by the WS with the response "the document 1 is already authorized".
Thread-2 tries to commit.
Thread-1 commits first the document 1, Thread-2 commits in second place.

MainDocument with id:1 is persisted with state ReadyForAuthorization, while the correct state should be Authorized.

The complexity arises because it's nearly impossible to reproduce. It happens only in production and even if I try to flood the server with hundreds of calls, I can't get the same behavior.

Implemented solutions:

Thread barrier, if two Threads with the same MainDocument id try to authorize, the last to enter is rejected. It's implemented with an aspect, with order 100, so it's executed after the @Transactional commit. Tested and checked on the stacktrace that transaction commits before the aspect intercepts and removes the thread from the barrier.
@Version, that works on other sections of the system, raising OptimisticLockException when one commit tries to override another commit from an older transaction. In this case OptimisticLockException is not being raised.
"Transaction" is persisted with @Transactional(propagation = REQUIRES_NEW) so it's independent from the main transaction and it's commited correctly. With this transactions it's clear that Lost update is an issue, because we can see the completed transaction with the success message, and the MainDocument persisted with different state, with no errors showing on the server.log.
Using Imperva SecureSphere we can audit all updates on a specific table. We can clearly see the first transaction commiting with the correct state and the second transaction overwriting the first.

I would be grateful if someone with concurrency and transaction managing experience can give me some useful tips on how to debug or reproduce the issue, or at least implement some solutions to mitigate the damages.

To be clear, there are more than 1000 request per hour and 99.99% of these requests end correctly. The total number of cases that this problem is present is about 20 per month.

Added 09-13-17:

The saveOrUpdate method we are using , if needed:

   * "http://blog.xebia.com/2009/03/23/jpa-implementation-patterns-saving-detached-entities/" >JPA
   * implementation patterns: Saving (detached) entities
   * 
   * @param entity
   */
  protected E saveOrUpdate(E entity) {
    if (entity.getId() == null) {
      getJpaTemplate().persist(entity);
      return entity;
    }
    if (!getJpaTemplate().getEntityManager().contains(entity)) {
      return merge(entity);
    }
    return entity;
  }

The Lost Update - Java, Spring and JPA

Answers (1)

Related Questions