Reputation: 347

Select a Node at Random from Unbalanced Binary Tree

One of my friends had the following interview question, and neither of us are quite sure what the correct answer is. Does anyone have an idea about how to approach this?

Given an unbalanced binary tree, describe an algorithm to select a node at random such that each node has an equal probability of being selected.

Upvotes: 5

Answers (4)

NielW

Reputation: 3774

I implemented @jim-mischel's algorithm in C# and it works great:

private void SelectRandomNode(ref int count, Node curNode, ref Node selectedNode)
{
    foreach( var childNode in curNode.Children )
    {
        ++count;

        if( random.Next(count) == count - 1 )
            selectedNode = childNode;

        SelectRandomNode(ref count, childNode, ref selectedNode);
    }
}

Call it like this:

var count = 1;
Node selected = root;
SelectRandomNode(ref count, root, ref selected);

Upvotes: 0

user2329744

Reputation: 111

We can do this recursively in one parse by selecting the random node while parsing the tree and counting the number of nodes in left and right sub tree. At every step in recursion, we return the number of nodes at the root and a random node selected uniformly randomly from nodes in sub tree rooted at root.

Let's say number of nodes in left sub tree is n_l and number of nodes in right sub tree is n_r. Also, randomly selected node from left and right subtree be R_l and R_r respectively. Then, select a uniform random number in [0,1] and select R_l with probability n_l/(n_l+n_r+1) or select root with probability 1/(n_l+n_r+1) or select R_r with probability n_r/(n_l+n_r+1).

Upvotes: 1

Jim Mischel

Reputation: 134055

You can do it with a single pass of the tree. The algorithm is the same as with a list.

When you see the first item in the tree, you set it as the selected item.

When you see the second item, you pick a random number in the range (0,2]. If it's 1, then the new item becomes the selected item. Otherwise you skip that item.

For each node you see, you increase the count, and with probability 1/count, you select it. So at the 101st node, you pick a random number in the range (0,101]. If it's 100, that node is the new selected node.

When you're done traversing the tree, return the selected node. The operation is O(n) in time, with n being the number of nodes in the tree, and O(1) in space. No preprocessing required.

Upvotes: 9

Bernhard Barker

Reputation: 55619

Note

If you're only doing a single query, and you don't already have a count at each node, the best time complexity you can get is O(n), so the depth-first-search approach would be the best one.

For repeated queries, the best option depends on the given constraints
(the fastest per-query approach is using a supplementary array).

Supplementary array

O(n) space, O(n) preprocessing, O(1) insert / remove, O(1) query

Have a supplementary array containing all the nodes.

Also have each node store its own index (so you can remove it from the array in O(1) - the way to do this would be to swap it with the last element in the array, update the index of the node that was at the last index appropriately and decrease the size of the array (removing the last element).

To get a random node, simply generate a random index in the array.

Per-node count

Modified tree (O(n) space), N/A (or O(n)) preprocessing, O(depth) insert / remove, O(depth) query

Let each node contain the number of elements in its subtree.

When generating a random node, go left or right based on the value of a random number generated and the counts of the left or right subtrees.

// note that subtreeCount = leftCount + rightCount + 1
val = getRandomNumber(subtreeCount)
if val = 0
  return this node
else if val <= leftCount
  go left
else
  go right

Depth-first-search

O(depth) space, O(1) preprocessing, O(1) insert / remove, O(n) query

Count the number of nodes in the tree (if you don't already have the count).

Generate a random number between 0 and the number of nodes.

Simply do a depth-first-search through the tree and stop when you've processed the desired number of nodes.

This presumes a node doesn't have a parent member - having this will make this O(1) space.