Badf
Badf

Reputation: 345

Check if there exist a path between two vertices in directed acyclic graph - queries

This question can be easily solved in O(n + m) per query, however is this possible to answer such queries in better complexity with preprocessing better than O(n²) ?

In tree it can be easily done by working with pre-order and in-order. I tried something similar in DAG but it doesn't make any sense.

I also tried to change this problem into LCA in DAG problem, but finding LCA in DAG can't be solved fast enough.


To be precise with constraints let's say:

n - number of vertices, up to 10^5

m - number of edges, up to 10^5

q - number of queries, up to 10^5

Upvotes: 13

Views: 2803

Answers (5)

Dave
Dave

Reputation: 9085

Since we have:

n - number of vertices, up to 10^5
m - number of edges, up to 10^5

Then, at least as you explore larger digraphs in your range, m is O(n), so O(n+m) = O(n).

  • Topologically sort the nodes, and associate each one with its rank in sorted order
  • associate each node with an n-bit integer or bitvector called 'predecessors'. The i'th bit of predecessors will indicate whether or not the given node can be reached from node i in the topo sort.

Now, parse the nodes in topo-sorted order.

Say we reach node X

  • if X has no parents, we do nothing
  • if X has parents P1, P2, ..., Pk, then: X.predecessors |= (P1.predecessors | P1.rank | P2.predecessors | P2.rank | ... | Pk.predecessors | Pk.rank)

There are n nodes to parse. Updating predecessors takes O(n) amortized time since there is one predecessor per arc and, if we accept that m = O(n) at least for large values of m & n, there are m = O(n) arcs.

Then, given a query, say for nodes X & Y,

return (X.predecessors & Y.rank) || (X.rank & Y.predecessors)

Upvotes: 0

eternal_learner
eternal_learner

Reputation: 66

If you are looking at serving fast multiple path_exists(src_vertex, dest_vertex) queries with some preprocessing, consider invoking Warshall's algorithm to determine the transitive closure which tells if a path exists between any two arbitrary nodes in a directed graph, as a preprocessing step (possibly at server startup). The algorithm has a worst case time complexity of O(n^3), and space complexity of O(n^2) ,where n is the number of nodes. The output of this algorithm is a nxn matrix where matrix[i][j] = 1 if there's a path between node_i and node_j, and zero otherwise. So, at query serving time, you could just return the result in O(1) time by looking the (i,j) pair up in the transitive closure matrix.

Upvotes: 0

Adder
Adder

Reputation: 25

Any algorithm for the reachability querying on DAGs can be used for answering such query on general directed graphs by compressing the strong connected components. So I do not think the DAG condition can do anything on reducing the complexity.

As for the reachability querying on directed graph, index construction techniques mentioned in this paper might be helpful for some cases.

Upvotes: 0

Erik P.
Erik P.

Reputation: 1627

I have a feeling that there might be a solution along the following lines, but this is by no means a full solution.

Let S be a subset of vertices. For each vertex V in the graph, consider the set D_S(V), which I define as follows: D_S(V) = {V} if V in S, and otherwise, D_S(V) is the union of {V} with the sets D_S(W) for all direct descendants W of V. (That is, it is "all eventual descendants of V, but stop the recursion whenever you hit an element of V".) The question is: can we find a set S such that the size of S is O(f(N)) and also D_S(V) is of size O(g(N)) for all V, and where f and g are asymptotically sublinear? (Maybe we can achieve sqrt for both, for example.)

If we can find this, then I suggest the following strategy. For preprocessing, create for each U in S a hash table containing all vertices eventually reachable from U. This can be achieved in O(f(N) * M); that's not necessarily better than O(N^2), but better than O(M*Q) at least.

Now to answer a query "is U reachable from V?", this is trivial if V in S. Otherwise, we check if V = U, in which case it's also trivial. Finally, we apply the same process to all descendants of V, recursively, returning "yes" if the answer is "yes" thru either of the two cases above, but "no" only if we never find U. This recursion takes up to O(g(N)) steps.

The question that remains is how to choose S. I think if the graph arises from some process where the out-degree follows a power law, one might just take the sqrt(N) vertices with the highest out-degrees. But for example if we have the graph on N=2*K vertices (i, 0) and (i, 1), with K^2 edges: from each (i, 0) to each (j, 1); then there is no suitable subset S. But maybe graphs that do not have a suitable S have by necessity a degree of uniformity we can exploit... Or maybe not. I don't know. Any ideas, let me know!

Upvotes: 0

shx2
shx2

Reputation: 64398

Interesting question. My intuition says "no". I haven't thought it through though.

However (assuming this question is not a theoretical one), for practical purposes, you can use a Bloom filter.

One possible solution to your problem using a Bloom filter would first generate K different orders of the graph, and for each, store the mapping from a node to its index in that order. Then, to test "reachability" from N1 to N2, you check (foreach order) if index-of-N1 is less than index-of-N2 (this check is O(1)). If this holds for all of the orders, it is reachable with pretty good probability (assuming K is big enough). (Depending on your real world use case, it might even be OK to gerenate such false positives occasionally, or you can then run the reliable O(N+M) check). Else, it is definitely not.

Upvotes: 0

Related Questions