Reputation: 2700
I have a database with 3.4 millions of nodes and want to select a random node.
I tried using something like
MATCH (n)
WHERE rand() <= 0.01
RETURN n
LIMIT 1
but it seems like the algorithm always starts with the same nodes and selects the first one whose random number is below 0.01, which means in most cases the "random" node is one of the first 100 checked nodes.
Is there a better query, to select a completely random one of all my nodes?
Upvotes: 2
Views: 1036
Reputation: 5754
This works, returning variations on successive runs. Query performance with 3.4 million nodes may or may not be acceptable though.
MATCH (n)
RETURN n
ORDER BY RAND()
LIMIT 1
Upvotes: 0
Reputation: 11216
You could generate an random ID from the rand()
function and multiply it by the number of nodes. This should generally return a more random node.
MATCH (n)
WHERE id(n) = toInteger(rand() * 3400000)
Once there is some space created within your nodes (i.e. they are no longer perfectly contiguous due to deletes) you might miss a few here and there. In that case you could always range the random number +/- a few on either side and return the first row of the result.
WITH toInteger(rand() * 3400000) AS rand_node, 5 AS offset
WITH range(rand_node - offset, rand_node + offset) AS rand_range
MATCH (n)
WHERE id(n) IN rand_range
RETURN n
LIMIT 1
Upvotes: 1