How to model service dependencies in a microservice architecture?

Question

We are trying to build our system in as decoupled a fashion as possible. We'd ideally like microservices to do only one thing, and one thing well. They should not know about dependencies. They should take a job from a queue, complete the job, and somehow emit a job completed event (I'll come back to this).

Our system contains "Snapshots" (images) as the base, atomic unit. An "Event" is a grouping of snapshots with a maximum length of 5 minutes.

Once we receive snapshots into our system, and figure out which event they belong to, we enqueue those snapshots to a RabbitMQ instance for some image analysis to be performed. We then have "snapshot-analyser" microservices pulling off this queue and performing image analysis. These microservices write directly to the database, appending some more metadata to the image objects. These are also stateless, and easy to scale horizontally.

The issue is, there are tasks to be done AFTER the snapshot-analyser has completed its work. If we detect certain attributes on a snapshot, we want to perform work on that Event using an "event-analyser". We don't want to perform work on this Event more than once (so if multiple snapshots have these attributes, it doesn't matter - we still just want to do work on the event once). This is proving quite challenging to engineer, especially in a distributed environment where we have several of these image-analysers pulling off the queue. What we do currently, is if we detect these attributes on a snapshot (meaning we want work done on the Event containing this snapshot), we write that to the Event. if it's the FIRST time that is written to the event, we enqueue it to our 2nd queue for Event processing. This ensures the event is only queued maximum once.

The problems with the above approach are as follows:

Dependency between snapshot-analyser and event-analyser lives inside the snapshot-analyser. Ideally I'd like the snapshot analyser to have NO knowledge of the event-analyser. It should just do it's job, and not care about enqueuing anything. I'm not sure where this dependency should be encoded.
Figuring out to queue an event, when multiple snapshots for that event are being processed concurrently. How to only enqueue the work for the Event, just the once. We "abuse" MongoDB's atomic update returning whether it was successful or not when $set is called.

Does anyone have any thoughts or examples of how similar dependencies are declared? Do I need a dispatcher service that is responsible for queuing to the right things, and pulls from a job done queue or something.

How to model service dependencies in a microservice architecture?

Answers (1)

Related Questions