Steve Chamaillard
Steve Chamaillard

Reputation: 2339

Microservices and isolated persistence - how should the data be stored/fetched?

At my company, we're about to move to the micro services architecture. I read a lot about it, and there are tons of obscure areas where it's specific to the project built, but one area seems to get everyone to agree, microservices need to have isolated persistence or another way to say it, they need to have they own database.

Now I love the idea, that means every microservice has its own database schema, its own domain objects and is 100% independent of any other microservice data structure.

There are things I don't quite understand though.

The "Customer Service" is obviously central to the application, and we can see that basically any other microservice will need some data about the user at some point. Whether it'd be the user's credit amount, its ID, or its name.

But since other microservices can't directly read into the Customer Service database, they'll need to query this service over and over again. This is fine (I guess) for simple stuff like getting the name of current logged user, but when we need to display 60 users on a page and we can't do any SQL join, it feels like we're missing something. This is even worse when microservices depend upon tons of microservices.

So I found out that some people actually queried microservices X times a day to get data into their own microservices.

So if microservice "Search" needs data from "Product", "Customer", it'll actually query these microservices and will persist the data with its own data structure.

The question I have is should it be "Search" that queries "Product" and "Customer", or should "Product" and "Customer" send data to "Search" ?

The first option looks a bit easier to do, we only need to have this logic on one side, and that's where the data is needed. But we'll only get static freshness of data which is not very smart, but could definitely work. The second option looks a bit more difficult but more scalable too, because we could have very fresh data when we need it, since the data changed where it's sent, it could also be more granular.

Upvotes: 1

Views: 1450

Answers (2)

techagrammer
techagrammer

Reputation: 1306

your identification of the problem is correct.

But the solution to your problem will depend on use case to use case.

In your example of search service , product service and customer service should publish their events on kafka or similar messaging and search service listen to them and updates it.

In case of lets say in order service while creating an order for a customer , you want to check customer exists , then you might do it by calling the sync api of customer service , but for that also there are variour other approaches , i have answered here linking Microservices and allowing for one to be unavailable

From my perspective sync communication between services should be avoided , and there are way around for this , above link would help

You can use domain driven design philosophy to correctly break your services and their contract

Upvotes: 1

usr
usr

Reputation: 171188

I think you correctly identified downsides to the microservices approach! And there are no elegant solutions to these specific problems. You will have to eat the additional work and architecture deterioration that this brings.

Concretely addressing your question now:

The question I have is should it be "Search" that queries "Product" and "Customer", or should "Product" and "Customer" send data to "Search" ?

You seem to be looking for a data synchronization service. You want to decide between push and pull. You are concerned about data freshness and logic duplication.

The key point here is that the source service cannot know about its consumers. This is to prevent an unwanted reverse dependency. This would break architectural isolation. Any data sync process that maintains this is fine. You can do what is most convenient.

For example, you could make the data source expose two APIs:

  1. An API to get the whole data set. This would be called periodically by the destination (e.g. nightly). It can also be used to seed the destination at will and to fix data errors there.
  2. A feed of changes in the source database keyed by the date and time the change occurred. The destination can now poll that change feed very frequently (e.g. every few seconds or minutes) and apply the small delta that occurred.

You can even build a realtime change feed through a publish-subscribe middleware. Many message queue softwares can do that. The source would just send out changes to the middleware.

Building all of this is conceptually simple but takes a lot of work. It also creates lots of ongoing work and increases the potential for bugs. Debugging becomes much harder. I have worked on systems like that.


I'm going to add a subjective note: Microservices are not well understood by many teams. The downsides are often ignored. You identified a few of the downsides correctly and they are nasty! Given what I read on the web I believe many teams do not realize the mess they are getting themselves into. Managing disparate data stores can be a nightmare. This is not a one-time "mess" but an ongoing one.

As an alternative I'd recommend using a common data store and building services simply as classes or projects that live in the same process. This gives you the microservices code structuring with the convenience of normal development. It also leaves a few of the upsides of microservices on the table.

Upvotes: 1

Related Questions