Jobs in the queue(pub-sub) distributed systems with dependencies?

Question

How to approach a problem when there are jobs put in the queue(pub-sub) distributed systems, and they have a dependency between them.

For e.g. current state of the queue:
j3 -> j2 -> j1
rear      front
j3 depends on the completion of j1.

The queue processor is consuming these jobs and started processing it in a distributed environment.

Based on some dependency resolution mechanism, dependency between j1 and j3 was found out.

Now, what I don't know is, the best way to deal with situation:

should I put j3 back in the queue, and again pick it up at the later stage so that j1 would have completed by that time?
should I have some other mechanism - database to check if all the j3 dependencies have met and then process j3?

Any help would be appreciated.

Thanks!

Drathier · Accepted Answer

Having a job scheduler that's aware that these jobs are at the front of the queue, but are waiting on some dependencies, is the best way. That way, you can get other jobs done while waiting for the dependencies to finish, but still process them as much in order as possible.

Pushing items back onto the start of the queue is a good workaround, if it's relatively cheap to do so, if the queue length is relatively short and if there are quite few dependencies. If the item you push to the back is also a dependency of other tasks, they too need to be pushed to the back of the queue when they arrive at the front (or at once, but that's unnecessarily hard). If the queue length is long, you could see unexpected delays. For example, if the queue is a day long, you could end up waiting days for a task to finish. If that task is part of a chain of dependencies, the problem grows.

Either way, you're going to need to know if a task is queued/running/finished. You could store this information in your favourite database or use some gossip protocol or whatever you like. If it's not a correctness problem if the same job is executed twice, you can use an AP system (in the CAP sense, with eventual consistency, such as a gossip protocol). If running the same task twice is going to mess things up badly, you'll need some consensus mechanism, like a single source of truth, such as your favourite sql database or maybe couchbase.

Jobs in the queue(pub-sub) distributed systems with dependencies?

Answers (1)

Related Questions