Reputation: 3845
This is about topological sorting dependent tasks for optimal scheduling. Suppose we have 4 tasks: A,B,C and a forth one D which depends on the other three. We can display that in a dependency graph like follows:
*---> A
/
D -----> B
\
*---> C
If we want to process all the tasks, we can do a topological sort which yields for example A,B,C,D or any other order where D comes last. If we have a single worker, it can process tasks according to that ordering.
Now let's say we have two workers W1 and W2. Now a possible scheduling policy could be: "If there is a task without dependencies process it. Otherwise wait until any task finished and check again".
For example: W1 pulls A. W2 pulls B. W1 finishes and pulls C. W2 finishes and waits. W1 finishes and pulls D.
As we can see, one of the workers must wait at some time and in general we cannot do much against it.
However
if we assign runtime estimations to the tasks we can optimize the global runtime. Let's say the tasks get the following time estimations:
*---> A (5s)
/
D (2s) -----> B (10s)
\
*---> C (5s)
Obviously it would be suboptimal if one worker starts task A and the other starts task C. They would be finishing almost at the same time and one of them would have to wait until the other one finished the long running task B.
Much better would be a policy where W1 starts task B and W2 processes A and C in the meantime. This decreases the estimated total processing time from 17s to 12s.
Such a policy could be formulated like: "Take all the tasks without pending dependencies and process the one with the longest path to the graph root. If there is no task without pending dependencies, wait."
That means we assign each task a priority P which is the sum of all nodes to the root* of the dependency tree and start with the highest priorities.
P(A) = 5s + 2s = 7s
P(B) = 10s + 2s = 12s
P(C) = 5s + 2s = 7s
Now my questions
*
The algorithm also works for graphs with cross dependencies like:
*---> A (12s)
/
D (2s) -----> B (10s) -*----> X (5s)
\ \
*---> C (5s) ---*--> Y (6s)
The priorities are the maximum of the runtimes of all paths:
P(A) = 12s + 2s = 14s
P(X) = 5s + 10s + 2s = 17s
P(Y) = 6s + 10s + 2s = 18s (and not the runtime of the "shorter" path over C)
Upvotes: 1
Views: 224
Reputation: 3615
This is a job scheduling problem. Finding the optimal schedule (that minimizes the makespan) is NP-hard, even when there are no dependencies, by reduction from the partition problem. See https://en.wikipedia.org/wiki/Identical-machines_scheduling.
If you want the optimal solution, you can express this as an instance of integer linear programming and solve it with an ILP solver, but in the worst case this could take exponential time.
Upvotes: 1