Dominic Bou-Samra
Dominic Bou-Samra

Reputation: 15416

How to model service dependencies in a microservice architecture?

We are trying to build our system in as decoupled a fashion as possible. We'd ideally like microservices to do only one thing, and one thing well. They should not know about dependencies. They should take a job from a queue, complete the job, and somehow emit a job completed event (I'll come back to this).

Our system contains "Snapshots" (images) as the base, atomic unit. An "Event" is a grouping of snapshots with a maximum length of 5 minutes.

Once we receive snapshots into our system, and figure out which event they belong to, we enqueue those snapshots to a RabbitMQ instance for some image analysis to be performed. We then have "snapshot-analyser" microservices pulling off this queue and performing image analysis. These microservices write directly to the database, appending some more metadata to the image objects. These are also stateless, and easy to scale horizontally.

The issue is, there are tasks to be done AFTER the snapshot-analyser has completed its work. If we detect certain attributes on a snapshot, we want to perform work on that Event using an "event-analyser". We don't want to perform work on this Event more than once (so if multiple snapshots have these attributes, it doesn't matter - we still just want to do work on the event once). This is proving quite challenging to engineer, especially in a distributed environment where we have several of these image-analysers pulling off the queue. What we do currently, is if we detect these attributes on a snapshot (meaning we want work done on the Event containing this snapshot), we write that to the Event. if it's the FIRST time that is written to the event, we enqueue it to our 2nd queue for Event processing. This ensures the event is only queued maximum once.

The problems with the above approach are as follows:

Does anyone have any thoughts or examples of how similar dependencies are declared? Do I need a dispatcher service that is responsible for queuing to the right things, and pulls from a job done queue or something.

Upvotes: 3

Views: 1073

Answers (1)

Rob Conklin
Rob Conklin

Reputation: 9481

Ultimately, your problem is one of needing to globally synchronize a distributed processing system. It's a very old problem, and most people fix it in exactly the way you are fixing it, by using their databases' built in capabilities to handle synchronization of distributed systems. There are lots of other methodologies, but if you are already using a piece of infrastructure that does it well (and most databases do), then go ahead and leverage it.

I'd say for the other problem (decoupling snapshot-analyser from event-analyser), you either have to make snapshot-analyser aware of the requirements to only analyse an event once (like you are), or have event-analyser aware of the requirement. If you have snapshot-analyser just blindly enqueue messages for event-analyser, and have event-analyser be the one that does the database work to avoid double-processing, you will nicely encapsulate the requirement, with the caveat of adding extra messages to the queue. This has a bonus in that you have a choke-point where you could accumulate these things in memory at a single choke-point, and not have to make external database calls.

Upvotes: 1

Related Questions