Reputation: 1674
I'm having an issue at work, that tried for months to solve it and it's driving me nuts.
The thing is hard to explain, it involves some particularities of the domain that I'm not allowed to discuss, and I can't copy-paste the exact code. I'll try to make myself as clear as I can with some representative examples.
Briefly speaking, the system consists in an root entity, let's call it MainDocument entity. Around this entity, there are several entities orbiting. The MainDocument
entity has a State. Let's call this state "MainDocumentState".
public class MainDocument {
@OneToOne
@JoinColumn(name = "document_state_id")
MainDocumentState state;
@Version
long version = 0L;
}
There are around 10 states available, but on this example will focus on two of them. Lets call them, ReadyForAuthorization
and Authorized
.
That's all you need to know for the example.
About the technologies that we are using:
About the issue itself:
There is a section of the system that is critical, and handles most of the incoming traffic. Let's call this section "the authorization section". On this section, we send information via a SOAP WS provided by our country's Customs and Border Protection, to authorize the MainDocument
against Customs.
The code looks like this:
@Transactional
public void authorize(Integer mainDocId) {
MainDocument mainDocument = mainDocumentService.findById(mainDocId);
// if document is not found, an exception is thrown.
Assert.isTrue(mainDocument.notAutorized(), "The document is already authorized");
// more business logic validations happen here. This validations are not important for the topic discussed here. They make sure that the document meets some basic preconditions.
try {
Transaction aTransaction = transactionService.newTransaction(); // creates a transaction, an entity stored in the database that keeps track of all the authorization service calls
try {
Response response = wsAuthroizationService.sendAuthorization(mainDocument.getId(), mainDocument.getAuthorizationId()); // take into account that sometimes this call can take between 2-4 minutes.
catch (Exception e) {
aTransaction.failed();
transactionService.saveOrUpdate(aTransaction);
throw e;
}
// the behaviour is the same for every error code.
if (response.getCode() != 0) {
aTransaction.setErrorCode(resposne.getCode());
transactionService.saveOrUpdate(aTransaction);
throw AuthroizationError("Error on auth");
}
aTransaction.completed();
mainDocument.setAuthorizationCode(0);
mainDocument.authorize(); // will change state to "Authorized"
} catch (Exception e) {
mainDocument.authorize(); // will not change state because authorizationCode != 0 or its null.
} finally {
saveOrUpdate(mainDocument);
}
}
When does the lost update happen and how it affects the system:
MainDocument with id:1 is persisted with state ReadyForAuthorization, while the correct state should be Authorized.
The complexity arises because it's nearly impossible to reproduce. It happens only in production and even if I try to flood the server with hundreds of calls, I can't get the same behavior.
Implemented solutions:
I would be grateful if someone with concurrency and transaction managing experience can give me some useful tips on how to debug or reproduce the issue, or at least implement some solutions to mitigate the damages.
To be clear, there are more than 1000 request per hour and 99.99% of these requests end correctly. The total number of cases that this problem is present is about 20 per month.
Added 09-13-17:
The saveOrUpdate
method we are using , if needed:
* "http://blog.xebia.com/2009/03/23/jpa-implementation-patterns-saving-detached-entities/" >JPA
* implementation patterns: Saving (detached) entities</a>
*
* @param entity
*/
protected E saveOrUpdate(E entity) {
if (entity.getId() == null) {
getJpaTemplate().persist(entity);
return entity;
}
if (!getJpaTemplate().getEntityManager().contains(entity)) {
return merge(entity);
}
return entity;
}
Upvotes: 4
Views: 1623
Reputation: 611
The main problem is concurrency. The way your code looks like now, it's trying to check if the entity was authorized, when it should check if it was authorized OR is in the process of being authorized.
It leads to the important question: How to check if an entity is already being manipulated across the system?
I've faced some situations that look similar, including scenarios with code running in clusters. The best working solution I found was to use some form o Database lock.
The @Version should be a good and quick solution, but you stated that it's not working properly. You also stated that you can audit the database using a tool, it would be interesting to check how the version field is behaving in this case.
With no @Version, I would try some "hardcore" pessimistic database lock. The proposed solution is certainly not the only, or the best one.
1 - Create a new table. This table will store the Ids of documents being processed. The PK should be the document Id, or anything else that ensures that the same document won't have duplicates in this table.
2 - In your code, before retrieving the entity, check if the id is in the table created in step 1. If it's not, go ahead. If it is, assume it is being processed and do nothing.
3 - In your code, right after retrieving the entity, you must insert the ID in the table created in step 1.
If the document is not being authorized, the insert will be successful and the process continues.
If by any chance, two requests are being executed at the same time, one of the requests will get a Constraint Violation Exception (or something similar). Then your code should assume that the document is being authorized.
Important: the insertion must be executed in a new transaction. The spring bean that is used to persist the Id in the new table should have it's methods marked as @Transaction(propagation = Propagation.REQUIRES_NEW)
.
4 - After the Webservice is called and the response is properly processed, remove the Id from the table created in step 1. It also should be executed in a separated transaction.
Consider doing it in a finally block, because if any other runtime error occurs, the document id should be removed from the table.
How to debug:
Run the app in a local environment, and put a breakpoint right after the entity is retrieved and before the insertion in the new table. If you want to debug your current code, then I would put the breakpoint right after the Assert statement.
Open two different browsers in your dev machine, and perform the use case that triggers this code. You can also ask for a team member to perform it from his machine.
You should see your IDE showing the code being executed at the breakpoint twice. After that just let both executions run, one after another, and enjoy the show. The scenario should be reproduced.
Basically this emulates two simultaneous requests.
Considerations:
Upvotes: 1