Rasmond
Rasmond

Reputation: 512

What to do when exception is thrown after state is modified?

In our system the user makes an order, which is a synchronous REST POST method call. The service then modifies the state of the system.

Now we are struggling with how to cleanup the state if the service modified it, but failed in the end, eg. due to system shutdown?

In a asynchronous approach it would be pretty straightforward - the message from the queue would not be processed so it would be retried.

However in a synchronous approach the client already got 500 error. He may never retry the action.

The only idea we have come up with is to have a background job doing the necessary cleanup (seems like implementing eventual consistency). What is the correct way to do that?


NOTE:

This might apply to any system, but in our case the "state modification" is actually a complex operation across multiple microservices using the saga pattern, which needs to be rolled back if something fails

Upvotes: 1

Views: 260

Answers (4)

StepUp
StepUp

Reputation: 38199

across multiple microservices using the saga pattern, which needs to be rolled back if something fails

It looks like it is a case where compensating transaction should be used. There is a good resource called https://microservices.io describing what compensating transaction is:

Implement each business transaction that spans multiple services is a saga. A saga is a sequence of local transactions. Each local transaction updates the database and publishes a message or event to trigger the next local transaction in the saga. If a local transaction fails because it violates a business rule then the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.

In addition, I highly recommend you to read this article about compensating transaction.

Upvotes: 0

Matt Timmermans
Matt Timmermans

Reputation: 59303

It sounds like you're thinking about this the wrong way.

Now we are struggling with how to cleanup the state if the service modified it, but failed in the end, eg. due to system shutdown?

The service should ensure (using transactions, etc.) that the durable state only changes from one consistent state to another. If the service has committed a modification to the state, then the state is changed -- the order is made and that is the base truth -- there is nothing to "clean up".

In a asynchronous approach it would be pretty straightforward - the message from the queue would not be processed so it would be retried.

No, is is not straightforward. You have no guarantee that the user got a successful acknowledgement that the order has been made. His browser could have crashed after you added the order to the queue. The problem is the same:

The order has been made, but the user doesn't know whether its been made or not.

The only idea we have come up with is to have a background job doing the necessary cleanup (seems like implementing eventual consistency). What is the correct way to do that?

No, the state is clean. The order is made. No clean-up is required. The problem is that the user doesn't know the state. If your app is still alive on the client side, then after receiving a 500, it should query the system to determine the state of the order. In case the app on the client side is dead, you have to design your UI so that it's easy for the user to see the state of the order when he reconnects.

Upvotes: 0

Mik378
Mik378

Reputation: 22191

Now we are struggling with how to cleanup the state if the service modified it, but failed in the end, eg. due to system shutdown?

Without any further information, the answer would be pretty simple.
Your command handler (service) should be wrapped entirely in a transaction.
If the command has failed for a technical reason, then no transaction is committed.
Therefore, no state is changed.

If your service is involved in a saga, then the good practice is to save in database the saga's state each time it changes.
So that you can reload your saga with its last state as soon as the server restarts after a crash and get a consistent state.

Upvotes: 1

inf3rno
inf3rno

Reputation: 26137

In eventual consistency either you send 202 accepted, or your try to await the processing of the request. Eventual consistency has events in its name and the most popular method for it is using domain events. Whenever you send a HTTP request, you add a domain event to the domain event queue, which is usually saved to the event storage. So you have a series of events there for example UserCreated, UserProfileUpdated, UserPasswordChanged, etc. This part is more or less synchronous, or at least the event queue saves the event to guarantee that it is not lost in the case of a power outage. You modify your databases based on these events. In the case of CQRS you have query databases, which are eventually consistent with the event storage. So the event storage is always up to date, and these query databases may have some delay, usually a few msecs or secs, but it depends on your business how long delay you allow or is acceptable for your consumers. It can depend on the type of the event and the type of the database, so there can be priority databases and events, which are important to process fast, and regular events, which are not that important. For example a password change should be almost immediate in my opinion, but a profile update can wait even a few secs. In the case of eventual consistency there is no rollback after the event is stored in the event storage. All you can do is compensating it with either latter events, something like UserCreationCancelled or UserCreationFailed or making some sort of exception in your business e.g. removing the partially created user manually from the database and don't automate these rare events. So the event storage describes the past and you cannot change the past after things happened. A rare and I think bad approach is restoring a previous point of your query databases and removing the event from the event storage and processing the other events again, but this is very complicated and if you don't think of everything e.g. the user was created and did something that affects other entities, you might end up with a broken timeline and broken databases.

Upvotes: 0

Related Questions