Jessica
Jessica

Reputation: 2435

How to noisily select k smallest elements of an array?

So I wrote a function to find the k nodes of a graph that have the smallest degree. It looks like this:

def smallestKNodes(G, k):
    leastK = []
    for i in range(G.GetMxNId()):
        # Produces an iterator to the node
        node = G.GetNI(i)
        for j in range(k):
            if j >= len(leastK):
                leastK.append(node)
                break
            elif node.GetDeg() < leastK[j].GetDeg():
                leastK.insert(j, node)
                leastK = leastK[0:k]
                break
    return leastK[0:k]

My problem is when all the nodes have the same degree, it selects the same nodes every time. How can I make it so it takes all the nodes with zero degree or whatever and then selects k nodes randomly?

Stipulations:

(1) Suppose k = 7, then if there are 3 nodes with degree 0 and 10 nodes with degree 1, I would like to choose all the nodes with degree 0, but randomly choose 4 of the nodes with degree 1.

(2) If possible I don't want to visit any node twice because there might be too many nodes to fit into memory. There might also be a very large number of nodes with minimum degree. In some cases there might also be a very small number of nodes.

Upvotes: 1

Views: 132

Answers (2)

anatolyg
anatolyg

Reputation: 28320

You might want to do two passes, the first pass to discover the relevant degree you have to randomize, how many nodes of that degree to choose, and the total number of nodes with that degree. Then, do a second pass on your nodes, choosing only those with the desired degree at random.

To choose k nodes of n total so each node has a fair probability (k/n), loop over relevant nodes, and choose each one with probability 1, 1, ..., 1, k/(k+1), k/(k+2), ..., k/n. When choosing a node, if k nodes are already chosen, throw one of them away at random.

def randomNodesWithSpecificDegree(G, d, k, n):
    result = []
    examined = 0
    for i in range(G.GetMxNId()):
        # Produces an iterator to the node
        node = G.GetNI(i)
        if node.GetDeg() = d:
            examined = examined + 1
            if len(result) < k:
                result.append(node)
            elif random(0...1) < k / examined
                index = random(0...k-1)
                result[index] = node
    assert(examined = n)
    return result

This pseudo-code is good when k is small and n is big (seems your case).

Upvotes: 1

Jacob
Jacob

Reputation: 34621

Store all the nodes which satisfy your condition and randomly pick k nodes from it. You can do the random pick by shuffling the array (e.g. Fisher-Yates, std::shuffle, randperm, etc.) and picking the first k nodes (for example).

Upvotes: 2

Related Questions