Path from s to e in a weighted DAG graph with limitations

Question

Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:

the total distance is less than some constant d

starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)

We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?

G. Bach · Accepted Answer

Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.

EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.

Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.

How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.





How many paths can there be from one vertex to another in the graph described above? The answer is 2^n-2. Proof by induction:

Take the graph over 2 vertices as described above; there is 1 = 2⁰ = 2^2-2 path from vertex 1 to vertex 2: (1->2).

Induction step: assuming there are 2^n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2^i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σⁿ_k=2 (2^k-2) = 1 + Σ^n-2_k=0 (2^k-2) = 1 + (2^n-1 - 1) = 2^n-1 = 2^(n+1)-2.



So we see that there are DAGs that have 2^n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.



EDIT: Ok so here goes what I would do:


Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a 

path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
   
        traverse the vertices of [s;e] in the order imposed by the topological 
            sorting
        for every vertex v, store a two-dimensional array of information; let's call it 
            pre_v[][] since it's gonna store information about the predecessors
            of a node on the paths leading towards it
        in pre_v[i][j], store how long the total path of length (counted in
        vertices) i is as a sum of the edge weights, if j is the predecessor of the
        current vertex on that path. For example, pre_s+1[1][s] would have
            the weight of the edge s->s+1 in it, while all other entries in pre_s+1
            would be 0/undefined.
        when calculating the array for a new vertex v, all we have to do is check
            its incoming edges and iterate over the arrays for the start vertices of those
            edges. For example, let's say vertex v has an incoming edge from vertex w,
            having weight c. Consider what the entry pre_v[i][w] should be.
            We have an edge w->v, so we need to set pre_v[i][w] in v to 

            min(pre_w[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
            path of length i - 1 that leads to w, and add the cost of the edge w->v.
            Why the minimum? The vertex w can have many predecessors for paths of length
            i - 1; however, we want to stay below a cost limit, which greedy minimization
            at each vertex will do for us. We will need to do this for all i in [1;s-v].
    While calculating the array for a vertex, do not set entries that would give you
        a path with cost above d; since all edges have positive weights, we can only get
            more costly paths with each edge, so just ignore those.
        Once you reached e and finished calculating pre_e, you're done with this
        part of the algorithm.
   
Iterate over pre_e, starting with pre_e[e-s]; since we have no cycles, all
    paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
    i such that pre_e[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
    any existing path using the arrays of the other vertices.


Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.

Path from s to e in a weighted DAG graph with limitations

Answers (1)

Related Questions