Sampling random nodes from a DAG

Question

I have a large directed, acylic graph (DAG) from which I would like to efficiently draw a sample node according to the following criteria:

I specify a fixed node A that must never be sampled
Nodes that directly or indirectly refer to A are never sampled
All other nodes are sampled with equal probability

Nodes are stored as objects with pointers to the other nodes that they refer to, the entire graph can be reached from a single root node that refers to everything else directly or indirectly.

Is there a good algorithm to do this? Ideally without requiring large amounts of additional memory since the DAG is large!

aioobe · Accepted Answer

The only solution I can come up with is to

put the nodes in a hash set
(traverse them from the root using, say, a breadth first traversal), O(|E|+|V|)
start from node A and remove all predecessors by traversing the edges backwards
(again O(|E|+|V|))
select a random node from the remaining nodes.

This would result in a O(|E|+|V|) algorithm with a O(|V|) memory requirement.

Note that you wouldn't have to copy the nodes in step 1, only save a reference to the node.

Sampling random nodes from a DAG

Answers (2)

Related Questions