Reputation: 63

Best practice to make client handle eventual consistency of microservices

I've been reading some articles and questions on eventual consistency and choreographing microservices, but I haven't seen a clear answer to this question. I'll phrase it in generic terms.

In a nutshell: if a client historically makes subsequent synchronous REST calls to your system, what do you do when the later calls may return unexpected results once the calls are made to different microservices (due to eventual consistency)?

Problem

Suppose you have a monolithic application that provides a REST API. Let's say there are two modules A and B you want to convert to microservices. The entities that B maintains can refer to entities that A maintains (e.g. A maintains students and B maintains classes). In the monolithic situation, the modules simply refer to the same database, but in the microservices situation, they each have their own database and communicate via asynchronous messages. So their databases are eventually consistent with respect to each other.

Some existing third-party client applications of our API are used to first (synchronously) calling an endpoint belonging to module A and, after that first call returns, immediately (i.e. a few ms later) calling an endpoint in module B as part of their workflow (e.g. creating a student and putting it in a class). In the new situation, this leads to a problem: when the second call happens, module B may not be aware of the changes in module A yet. So the existing workflow of the client application may break. (E.g. module B may respond: the student you're trying to put in the class doesn't exist, or it is in the wrong year.)

When the calls are done separately by a human user through some frontend application, this is not a big issue, as the modules are usually consistent after a second anyway. The problem arises when a client application (which is not under our control) just calls A and then immediately B as part of an automated workflow. The eventual consistency is simply not fast enough in this instance.

A simple diagram that describes the situation

Question

Is there a best practice, or a generally agreed upon set of options, to mitigate this problem? (I made up the student/class example, don't get hung up on the specifics of that. :))

What we can think of

Simply telling the developers of these clients: from now on, you have to implement a retry mechanism for every endpoint you call. The drawback seems obvious.
Implement an API gateway that waits until B is ready. Drawback: there are many conceivable workflows (involving more modules A-Z) that would require this, so the gateway might become quite complex.
Somehow create a "session" for the client that tracks which requests it has made in succession. Then B can figure out whether it should wait for a message from A, or it could even update its state just by looking at the precise request the client made to A.

Are there better methods? Which would be most suitable?

Edit: Clarified that the question primarily concerns the behaviour of third-party clients that call the endpoints in an automated way, meaning that even a few milliseconds 'lag' in the eventual consistency can be fatal.

Upvotes: 5

Answers (2)

Cosmin Ioniță

Reputation: 4055

The strong consistency centric solution of this problem is based on distributed transactions, which unfortunately come with high complexity and performance implications.

In this amazing article around monolith to microservices migration, Zhamak Dehghani addresses the data inconsistency too:

Distributed transactions are notoriously difficult to implement and as a consequence microservice architectures emphasize transactionless coordination between services, with explicit recognition that consistency may only be eventual consistency and problems are dealt with by compensating operations.

So eventual consistency is the only data consistency option in a microservices-based architecture, and if you need strong-consistency guarantees, then you need to build work-arounds (compensating operations), like retry flows, which will add additional complexity.

Moreover, the article highlights a really insightful way of seeing the data inconsistency with respect to the business workflows:

Choosing to manage inconsistencies in this way is a new challenge for many development teams, but it is one that often matches business practice. Often businesses handle a degree of inconsistency in order to respond quickly to demand, while having some kind of reversal process to deal with mistakes. The trade-off is worth it as long as the cost of fixing mistakes is less than the cost of lost business under greater consistency.

Here is the way I see this problem:

It's true that the storages between microservice A and B get updated in an async way, but what is the exact latency of this update workflow? If we're talking about 1 - 2 seconds, then the inconsistency may be perceived by the users at all. Otherwise, the system should be scaled out to support this (or even lower) latency threshold.
You can monitor the inconsistency events - when an user tries to fetch data which doesn't exist in a storage because it's in the update process, and scale your system based on that.
Bottom line is that it may help measuring out the need for such a consistency guarantee, and then apply a suitable workaround.

Upvotes: 2

David Browne - Microsoft

Reputation: 89051

Is there a best practice, or a generally agreed upon set of options, to mitigate this problem?

Yes. You can't break up every method into its own microservice with its own repository.

You scope your microservices and repositories to accommodate genuine requirements for strong consistency. If you have a use case where a call to service endpoint A is followed immediately by a call to service endpoint B which needs to see the results of the first call then A and B should be part of the same microservice or share the same repository.

Upvotes: 3

Best practice to make client handle eventual consistency of microservices

Problem

Question

What we can think of

Answers (2)

Related Questions