mikera
mikera

Reputation: 106361

Sampling random nodes from a DAG

I have a large directed, acylic graph (DAG) from which I would like to efficiently draw a sample node according to the following criteria:

  1. I specify a fixed node A that must never be sampled
  2. Nodes that directly or indirectly refer to A are never sampled
  3. All other nodes are sampled with equal probability

Nodes are stored as objects with pointers to the other nodes that they refer to, the entire graph can be reached from a single root node that refers to everything else directly or indirectly.

Is there a good algorithm to do this? Ideally without requiring large amounts of additional memory since the DAG is large!

Upvotes: 2

Views: 398

Answers (2)

aioobe
aioobe

Reputation: 421040

The only solution I can come up with is to

  1. put the nodes in a hash set
    (traverse them from the root using, say, a breadth first traversal), O(|E|+|V|)

  2. start from node A and remove all predecessors by traversing the edges backwards
    (again O(|E|+|V|))

  3. select a random node from the remaining nodes.

This would result in a O(|E|+|V|) algorithm with a O(|V|) memory requirement.

Note that you wouldn't have to copy the nodes in step 1, only save a reference to the node.

Upvotes: 2

The Alchemist
The Alchemist

Reputation: 3425

I'm not an expert in this area by any means, but I think you may want a Monte Carlo Markov chain sampling method such as the Metropolis-Hastings or Gibbs sampling algorithm.

You can find some code samples online, and you might have to modify the code to do exactly what you want it to do. There's some good presentations on the topic, like this.

Some software that might help you along are:

I don't know your familiarity with graph theory, so I'm not sure how difficult it would be for you to implement this.

Hope this was helpful...

Upvotes: 0

Related Questions