dantebarba
dantebarba

Reputation: 1674

The Lost Update - Java, Spring and JPA

I'm having an issue at work, that tried for months to solve it and it's driving me nuts.

The thing is hard to explain, it involves some particularities of the domain that I'm not allowed to discuss, and I can't copy-paste the exact code. I'll try to make myself as clear as I can with some representative examples.

Briefly speaking, the system consists in an root entity, let's call it MainDocument entity. Around this entity, there are several entities orbiting. The MainDocument entity has a State. Let's call this state "MainDocumentState".

public class MainDocument {
   @OneToOne
   @JoinColumn(name = "document_state_id")
   MainDocumentState state;
   @Version
   long version = 0L;
}

There are around 10 states available, but on this example will focus on two of them. Lets call them, ReadyForAuthorization and Authorized.

That's all you need to know for the example.

About the technologies that we are using:

  1. Spring
  2. GWT Webapp
  3. Java 1.6
  4. Hibernate
  5. JPA
  6. Oracle DB.

About the issue itself:

There is a section of the system that is critical, and handles most of the incoming traffic. Let's call this section "the authorization section". On this section, we send information via a SOAP WS provided by our country's Customs and Border Protection, to authorize the MainDocument against Customs.

The code looks like this:

@Transactional
public void authorize(Integer mainDocId) {

  MainDocument mainDocument = mainDocumentService.findById(mainDocId);
  // if document is not found, an exception is thrown.
  Assert.isTrue(mainDocument.notAutorized(), "The document is       already authorized");
  // more business logic validations happen here. This validations are not important for the topic discussed here. They make sure that the document meets some basic preconditions.

  try {
 
   Transaction aTransaction = transactionService.newTransaction(); // creates a transaction, an entity stored in the database that keeps track of all the authorization service calls
   try {
    Response response = wsAuthroizationService.sendAuthorization(mainDocument.getId(), mainDocument.getAuthorizationId()); // take into account that sometimes this call can take between 2-4 minutes. 
    catch (Exception e) {
     aTransaction.failed();
     transactionService.saveOrUpdate(aTransaction);
     throw e;
    }
    // the behaviour is the same for every error code.
    if (response.getCode() != 0) {
     aTransaction.setErrorCode(resposne.getCode());
     transactionService.saveOrUpdate(aTransaction);
     throw AuthroizationError("Error on auth");
    }
    aTransaction.completed();
    mainDocument.setAuthorizationCode(0);
    mainDocument.authorize(); // will change state to "Authorized"
   } catch (Exception e) {
    mainDocument.authorize(); // will not change state because   authorizationCode != 0 or its null.
   } finally {
    saveOrUpdate(mainDocument);
   }
  }

When does the lost update happen and how it affects the system:

  1. MainDocument id: 1@Thread-1 tries to authorize
  2. The document is not authorized, the execution continues
  3. Goes through the webservice and authorizes OK
  4. Transaction closes and commit happens.
  5. While 1 is commiting, MainDocument 1@Thread-2 comes in, and tries to to auth.
  6. 1 is not persisted yet, Thread-2 tries to auth.
  7. Thread-2 is rejected by the WS with the response "the document 1 is already authorized".
  8. Thread-2 tries to commit.
  9. Thread-1 commits first the document 1, Thread-2 commits in second place.

MainDocument with id:1 is persisted with state ReadyForAuthorization, while the correct state should be Authorized.

The complexity arises because it's nearly impossible to reproduce. It happens only in production and even if I try to flood the server with hundreds of calls, I can't get the same behavior.

Implemented solutions:

  1. Thread barrier, if two Threads with the same MainDocument id try to authorize, the last to enter is rejected. It's implemented with an aspect, with order 100, so it's executed after the @Transactional commit. Tested and checked on the stacktrace that transaction commits before the aspect intercepts and removes the thread from the barrier.
  2. @Version, that works on other sections of the system, raising OptimisticLockException when one commit tries to override another commit from an older transaction. In this case OptimisticLockException is not being raised.
  3. "Transaction" is persisted with @Transactional(propagation = REQUIRES_NEW) so it's independent from the main transaction and it's commited correctly. With this transactions it's clear that Lost update is an issue, because we can see the completed transaction with the success message, and the MainDocument persisted with different state, with no errors showing on the server.log.
  4. Using Imperva SecureSphere we can audit all updates on a specific table. We can clearly see the first transaction commiting with the correct state and the second transaction overwriting the first.

I would be grateful if someone with concurrency and transaction managing experience can give me some useful tips on how to debug or reproduce the issue, or at least implement some solutions to mitigate the damages.

To be clear, there are more than 1000 request per hour and 99.99% of these requests end correctly. The total number of cases that this problem is present is about 20 per month.

Added 09-13-17:

The saveOrUpdate method we are using , if needed:

   * "http://blog.xebia.com/2009/03/23/jpa-implementation-patterns-saving-detached-entities/" >JPA
   * implementation patterns: Saving (detached) entities</a>
   * 
   * @param entity
   */
  protected E saveOrUpdate(E entity) {
    if (entity.getId() == null) {
      getJpaTemplate().persist(entity);
      return entity;
    }
    if (!getJpaTemplate().getEntityManager().contains(entity)) {
      return merge(entity);
    }
    return entity;
  }

Upvotes: 4

Views: 1623

Answers (1)

Vitor Santos
Vitor Santos

Reputation: 611

The main problem is concurrency. The way your code looks like now, it's trying to check if the entity was authorized, when it should check if it was authorized OR is in the process of being authorized.

It leads to the important question: How to check if an entity is already being manipulated across the system?

I've faced some situations that look similar, including scenarios with code running in clusters. The best working solution I found was to use some form o Database lock.

The @Version should be a good and quick solution, but you stated that it's not working properly. You also stated that you can audit the database using a tool, it would be interesting to check how the version field is behaving in this case.

With no @Version, I would try some "hardcore" pessimistic database lock. The proposed solution is certainly not the only, or the best one.

1 - Create a new table. This table will store the Ids of documents being processed. The PK should be the document Id, or anything else that ensures that the same document won't have duplicates in this table.

2 - In your code, before retrieving the entity, check if the id is in the table created in step 1. If it's not, go ahead. If it is, assume it is being processed and do nothing.

3 - In your code, right after retrieving the entity, you must insert the ID in the table created in step 1.
If the document is not being authorized, the insert will be successful and the process continues.
If by any chance, two requests are being executed at the same time, one of the requests will get a Constraint Violation Exception (or something similar). Then your code should assume that the document is being authorized.
Important: the insertion must be executed in a new transaction. The spring bean that is used to persist the Id in the new table should have it's methods marked as @Transaction(propagation = Propagation.REQUIRES_NEW).

4 - After the Webservice is called and the response is properly processed, remove the Id from the table created in step 1. It also should be executed in a separated transaction.
Consider doing it in a finally block, because if any other runtime error occurs, the document id should be removed from the table.

How to debug:

  • Run the app in a local environment, and put a breakpoint right after the entity is retrieved and before the insertion in the new table. If you want to debug your current code, then I would put the breakpoint right after the Assert statement.

  • Open two different browsers in your dev machine, and perform the use case that triggers this code. You can also ask for a team member to perform it from his machine.

  • You should see your IDE showing the code being executed at the breakpoint twice. After that just let both executions run, one after another, and enjoy the show. The scenario should be reproduced.

  • Basically this emulates two simultaneous requests.

Considerations:

  • I choose to use a database table because this solution will work even if the app is deployed in cluster environment (multiple app server instances).
  • If there is only a single instance running, you could try using a object shared across requests, but if in the future you need to scale your app using clusters, then the solution won't work. Also you'll have to deal with thread safety.
  • You could also try using database locking, but you have to be careful not to lock your table/row for too long. Also, JPA doesn't have any specific operation to perform locks on tables/rows (at least I could't find one), so you would have to deal with native SQL.

Upvotes: 1

Related Questions