Regis May
Regis May

Reputation: 3466

Algorithm to order tasks with dependencies

In a private open source project I encounter the following problem:

There are a variety of tasks to perform. Some of these tasks will have annotations that

I'm looking for an easy algorithm of how to build a directed graph out of that information that then could be used for cycle detection and performing all task tasks in an order that allows respects all these conditions about their order of execution.

Q: What would be an efficient, good way to build such a graph?

Thank you for your help.

Note: It is clear we will require two extra nodes in the graph: A starting node and an end node. Let's name them START and END. It is clear that a node without dependency must end up in a construct such as START -> A -> END. But it is less clear to me to find a good way in how to end up in a START -> B -> C -> END sequence given that B must be followed by C while not getting an edge from B to END and no edge from START to C.

Upvotes: 1

Views: 3718

Answers (2)

Regis May
Regis May

Reputation: 3466

There is a "killer" feature in the requirements that prevents an easy solution. There are not one but two directions of constraints:

  • task A must be performed after task B => A "depends" on B
  • task A must be performed before task B => B "depends" on A

In the real world situation I'm facing these dependencies are even more complicated but that doesn't matter: All can be broken down to an analog situation as just described.

Now the algorithm:

Step 1: Compile all these constraints of each task into a set of single direction dependencies for each task. Single direction dependencies can then be handled real easily. The first idea of building a graph first, performing cycle detection second and then performing topological sorting (thanks for the term, Dimitry) can then be abandoned altogether. So each item ends up with a set of dependencies:

  • If task A must be performed after task B we store "B" in the dependency set of A.
  • If task A must be performed before task B we store "A" in the dependency set of B.

While doing this we can even do sanity checks for these dependencies. If there's something wrong in the constraint specifications we can easily detect this in this step.

Step 2: Now we have a very simple problem as there are only one way dependencies. These can be considered to be preconditions: A task can only be performed if all preconditions are met. Now we can proceed as follows:

pack all tasks into a list named /notYetProcessed/
create empty list /result/
while there are still tasks to process in /notYetProcessed/ do:
    create empty list /remaining/
    for all tasks /t/ in /notYetProcessed/ do:
        if all dependencies are met for /t/ then do:
            add /t/ to /result/
        else do:
            add /t/ to /remaining/
    if /length(notYetProcessed)/ matches /length(remaining)/ then do:
        terminate with error
    /notYetProcessed/ = /remaining/

After the outer while condition terminated result will contain a list of tasks to process in an order that follows all constraints as defined in the beginning.

If the above algorithm terminates with an error that means: * no task could be processed in this loop * which means: some dependencies could not be resolved * which means: a task dependency cycle exists involving exactly the remaining tasks

Step 3: Now process all tasks one by one as stored in result, one by one and you'll be fine.

As you can see this can be done without even building a special graph representation of the data. Topological sorting is performed "directly" on the data by accepting the first solution (= first variant of all possible sort orders) we can get our hands on.

There might be some even more efficient algorithms to solve this (after the first dependency compilation has been performed) I might not be aware of. If so I'd be happy to learn about them!

Upvotes: 2

Imran
Imran

Reputation: 13468

You can start with any order and then walk that order, swapping any elements that are out of order. You can repeat this until no more tasks are out of order.

I would use a hash table (or simply an array) for quick look-ups to determine if tasks are out of order.

Pseudocode:

class Task:
    id: int # serial id of task, ie 1..n
    not_before: array[int] # ids of tasks this task cannot precede
    not_after: array[int] # ids of tasks this task cannot come after

tasks: array[Task] = ... # tasks in order of ids

order: array[int] = [1,2,...,n] # task ids in initial order

positions: array[int] = ... # positions[i] is the index of task i in order array

def swap_tasks(i, j):
    swap(order[positions[i]], order[positions[j]])
    swap(positions[i], positions[j])

repeat:
    made_swap = False
    for i in 0..n: # loop over task ids
        for j in tasks[i].not_before:
            if positions[i] < positions[j]:
                swap_tasks(i, j)
                made_swap = True
        for j in tasks[i].not_after:
            if positions[i] > positions[j]:
                swap_tasks(i, j)
                made_swap = True
    if made_swap == False:
        break

For n tasks and O(k) constraints per tasks this should run in O(n²log(k)), since a task can move at most n times (since it can't go back past the position of the last task it was swapped with).

I thought about processing tasks in order and inserting not_after tasks, followed by task, followed by not_before tasks, and then inserting (or moving if they already appear) subsequent tasks to satisfy constraints, but this doesn't really seem to help since not_before and not_after tasks can be out of order with respect to each other, so we still need lots of swapping.

Upvotes: -1

Related Questions