Reputation: 293
I have a DAG with N nodes, i.e., 1, 2, ..., N
, and each node has a weight (we can call it time) x_1, x_2, ..., x_N
. I want to do a topological sorting but the difficulty is that I have an objective function when sorting. My objective function is to minimize the total time between several pairs of nodes.
For example, I have a DAG with 6 nodes, and I want a specific topological sorting such that (1,3) + (2,4)
is minimized, where (A,B)
denotes the time between two nodes A and B. For instance, if we have a sort [1, 6, 3, 2, 5, 4, 7]
, (1,3) = x_6
and (2,4) = x_5
. Based on the DAG, I want to find a sorting that minimizes (1,3) + (2,4)
.
I have been thinking this problem for a while. Generate all possible topological sorts (reference link) and calculate the objective function one by one is always a possible solution but it takes too much time if N is large. I was also suggested to use branch-bound pruning when generating all possible sorts (I am not very familiar branch-bound but I think that won't dramatically reduce the complexity).
Any (optimal or heuristic) algorithm for this kind of problem? It would be perfect if the algorithm can also be applied to other objective functions such as minimizing the total starting time for some nodes. Any suggestion is appreciated.
PS: Or alternatively, is it possible to formulate this problem as a linear integer optimization problem?
Upvotes: 15
Views: 2200
Reputation: 361
Here's an idea:
For simplicity, first suppose that you have a single pair to optimize (I'll comment on the general case later,) and suppose you already have your graph topologically sorted into an array.
Take the array segment starting at the lower (in terms of your topological order) node of the pair, say l
, and ending at the higher node, say h
. For every single node sorted between l
and h
, calculate whether it is bounded from below by l
and/or bounded from above by h
. You can calculate the former property by marking nodes in an "upward" BFS from l
, cutting at nodes sorted above h
; and similarly, the latter by marking in a "downward" BFS from h
, cutting at nodes sorted below l
. The complexity of either pass will be O( B*L ), where B
is the branching factor, and L
is the number of nodes originally sorted between l
and h
.
Now, all nodes that are not bounded from above by h
can be moved above h
, and all nodes that are not bounded from below by l
can be moved below l
(the two sets may overlap,) all without violating the topological sorting of the array, provided that that the original sorted order within each group of nodes relocated upward or downward is preserved.
This same procedure can be applied to as many pairs as needed, provided that the segments that they cut from the original sorting order do not overlap.
If any two pairs overlap, say (l1, h1)
and (l2, h2)
, so that e.g. l1 < l2 < h1 < h2 in the original sorted order, you have the following two cases:
1) In the trivial case where h1
and l2
happen to be unrelated in the topological order, then you should be able to optimize the two pairs mostly independently from each other, only applying some care to move either l2
above h1
or h1
below l2
(but not e.g. h1
below l1
, if that should turn out to be possible.)
2) If l2 < h1 in the topological order, you can treat both pairs as the single pair (l1, h2)
, and then possibly apply the procedure once more to (l2, h1)
.
As it's not clear what the complete process will achieve in the nontrivial overlapping case, especially if you have more complicated overlapping patterns, it may turn out to be better to treat all pairs uniformly, regardless of overlapping. In this case, the order can be processed repeatedly while each run yields an improvement over the previous (I'm not sure if the procedure will be monotonic in terms of the objective function though -- probably not.)
Upvotes: 0
Reputation: 68728
One way to solve this is as follows:
First we run All-Pairs shortest path algorithm Floyd-Warshall. This algorithm can be coded in essentially 5 lines of code and it runs in O(V^3)
time. It generates the shortest paths between each of the pair of vertices in the graph, i.e., it generates V X V matrix of shortest paths as its output.
It's trivial to modify this algorithm so that we can also get the count of vertices included in each of the O(N^2)
paths. So now we can eliminate all paths that have less than N vertices. For the remaining paths, we order them by their cost and then test each of them to see if topological sort property is not violated. If this property is not violated then we have found our desired result.
The last step above, i.e., testing topological sort can be performed in O(V+E) for each of the O(V^2) paths. This yields worst case runtime of O(V^4)
. However in practice this should be fast because Floyd-Warshall can be made very cache friendly and we would be testing only small fraction of O(N^2) paths in reality. Also if your DAG is not dense then you might be able to optimize topological testing as well with appropriate data structures.
Upvotes: 1