Reputation: 2435
So I wrote a function to find the k nodes of a graph that have the smallest degree. It looks like this:
def smallestKNodes(G, k):
leastK = []
for i in range(G.GetMxNId()):
# Produces an iterator to the node
node = G.GetNI(i)
for j in range(k):
if j >= len(leastK):
leastK.append(node)
break
elif node.GetDeg() < leastK[j].GetDeg():
leastK.insert(j, node)
leastK = leastK[0:k]
break
return leastK[0:k]
My problem is when all the nodes have the same degree, it selects the same nodes every time. How can I make it so it takes all the nodes with zero degree or whatever and then selects k nodes randomly?
Stipulations:
(1) Suppose k = 7, then if there are 3 nodes with degree 0 and 10 nodes with degree 1, I would like to choose all the nodes with degree 0, but randomly choose 4 of the nodes with degree 1.
(2) If possible I don't want to visit any node twice because there might be too many nodes to fit into memory. There might also be a very large number of nodes with minimum degree. In some cases there might also be a very small number of nodes.
Upvotes: 1
Views: 132
Reputation: 28320
You might want to do two passes, the first pass to discover the relevant degree you have to randomize, how many nodes of that degree to choose, and the total number of nodes with that degree. Then, do a second pass on your nodes, choosing only those with the desired degree at random.
To choose k
nodes of n
total so each node has a fair probability (k/n
), loop over relevant nodes, and choose each one with probability 1, 1, ..., 1, k/(k+1), k/(k+2), ..., k/n. When choosing a node, if k
nodes are already chosen, throw one of them away at random.
def randomNodesWithSpecificDegree(G, d, k, n):
result = []
examined = 0
for i in range(G.GetMxNId()):
# Produces an iterator to the node
node = G.GetNI(i)
if node.GetDeg() = d:
examined = examined + 1
if len(result) < k:
result.append(node)
elif random(0...1) < k / examined
index = random(0...k-1)
result[index] = node
assert(examined = n)
return result
This pseudo-code is good when k
is small and n
is big (seems your case).
Upvotes: 1
Reputation: 34621
Store all the nodes which satisfy your condition and randomly pick k
nodes from it. You can do the random pick by shuffling the array (e.g. Fisher-Yates, std::shuffle
, randperm
, etc.) and picking the first k
nodes (for example).
Upvotes: 2