pnadeau
pnadeau

Reputation: 603

DAG - Algorithm to ensure there is a single source and a single sink

I have to ensure that a graph in our application is a DAG with a unique source and a unique sink.

Specifically I have to ensure that for a given start node and end node (both of which are known at the outset) that every node in the graph lies on a path from the start node to the end node.

I already have an implementation of Tarjan's algorithm which I use to identify cycles, and a a topological sorting algorithm that I can run once Tarjan's algorithm reports the graph is a DAG.

What is the most efficient way to make sure the graph meets this criterion?

Upvotes: 6

Views: 4692

Answers (4)

chill
chill

Reputation: 16888

First, do a DFS on the graph, starting from the source node, which, as you say, is known in advance. If you encounter a back edge[1], then you have a cycle and you can exit with an error. During this traversal you can identify if there are nodes, not reachable from the source[2] and take the appropriate action.

Once you have determined the graph is a DAG, you can ensure that every node lies on a path from the source to the sink by another DFS, starting from the source, as follows:

bool have_path(source, sink) {
    if source == sink {
        source.flag = true
        return true
    }

    // traverse all successor nodes of `source`
    for dst in succ(source) {
        if not dst.flag and not have_path(dst, sink)
           return false // exit as soon as we find a node with no path to `sink`
    }
    source.flag = true;
    return true
}

The procedure have_path sets a boolean flag in each node, for which there exists some path from that node to the sink. In the same time the procedure traverses only nodes reachable from the source and it traverses all nodes reachable from the source. If the procedure returns true, then all the nodes, reachable from the source lie on a path to the sink. Unreachable nodes were already handled in the first phase.


[1] an edge, linking a node with a greater DFS number to an node with a lesser DFS number, that is not already completely processed, i.e. is still on the DFS stack

[2] e.g. they would not have an assigned DFS number

Upvotes: 0

edwin
edwin

Reputation: 387

If your algorithm takes as input a DAG, which is weakly connected, assume that there are only one node s whose in-degree is zero and only one node t whose out-degree is zero, while all other nodes have positive in-degree and out-degree, then s can reach all other nodes, and all other nodes can reach t. By contradiction, assume that there is a node v that s cannot reach. Since no nodes can reach s, that is, v cannot reach s too. As such, v and s are disconnected, which contradicts the assumption. On the other hand, if the DAG is not weakly connected, it definitely does not satisfy the requirements that you want. In sum, you can first compute the weakly connected component of the DAG simply using BFS/DFS, meanwhile remembering the number of nodes whose in-degree or out-degree is zero. The complexity of this algorithm is O(|E|).

Upvotes: 0

Yuushi
Yuushi

Reputation: 26040

A simple breadth or depth-first search should satisfy this. Firstly you can keep a set of nodes that comprise sink nodes that you've seen. Secondly, you can keep a set of nodes that you've discovered using BFS/DFS. Then the graph will then be connected iff there is a single connected component. Assuming you're using some kind of adjacency list style representation for your graph, the algorithm would look something like this:

create an empty queue
create an empty set to store seen vertices
create an empty set for sink nodes

add source node to queue

while queue is not empty
    get next vertex from queue, add vertex to seen vertices
    if num of adjacent nodes == 0
        add sink nodes to sink node set
    else
        for each node in adjacent nodes
        if node is not in seen vertices
            add node to queue

return size of sink nodes == 1 && size of seen vertices == total number in graph

This will be linear in the number of vertices and edges in the graph.

Note that as long as you know a source vertex to start from, this will also ensure the property of a single source: any other vertex that is a source won't be discovered by the BFS/DFS, and thus the size of the seen vertices won't be the total number in the graph.

Upvotes: 0

templatetypedef
templatetypedef

Reputation: 372804

If your graph is represented by an adjacency matrix, then node x is a source node if the xth column of the matrix is 0 and is a sink node if row x of the matrix is 0. You could run two quick passes over the matrix to count the number of rows and columns that are 0s to determine how many sources and sinks exist and what they are. This takes time O(n2) and is probably the fastest way to check this.

If your graph is represented by an adjacency list, you can find all sink nodes in time O(n) by checking whether any node has no outgoing edges. You can find all sinks by maintaining for each node a boolean value indicating whether it has any incoming edges, which is initially false. You can then iterate across all the edges in the list in time O(n + m) marking all nodes with incoming edges. The nodes that weren't marked as having incoming edges are then sources. This process takes time O(m + n) and has such little overhead that it's probably one of the fastest approaches.

Hope this helps!

Upvotes: 1

Related Questions