Reputation: 77
I have a discontinuous list of numbers N (e.g. { 1, 2, 3, 6, 8, 10}
) and i need to progressively create random pairs of numbers in N and store them in a list in which there can't be twice the same pair.
For example for a list of 3 different of numbers there is 6 possible pair (not counting same number pair):
example for the list { 4, 8, 9 }
, the possible pairs are:
(4,8) (4,9) (8,4) (8,9) (9,4) (9,8)
When we arrive to a number list size of 30 for example, we get 870 possible pairs and with my current method I get less and less efficient the more possible pairs there are.
For now my strategy with a number list of size 30 for example is :
N = { 3, 8, 10, 15, 16, ... } // size = 30
// Lets say I have already a list with 200 different pairs
my_pairs = { (8,16), (23, 32), (16,10), ... }
// Get two random numbers in the list
rn1 = random(N)
rn2 = random(N)
Loop through my_pairs to see if the pair (rn1,rn2) has already been generated
If there is one, we pick two new numbers rn1 & rn2 at random and retry adding them to my_pairs
If not then we add it to the list
The issue is that the more pairs we have in my_pairs, the less likely it is for a pair to not be in that list. So we have to check multiple random pairs multiple times and go through the list every time.
I could try to generate all possible pairs at the start, shuffle the list and pop one element each time I need to add a random pair to my list. But it will take a lot of space to store all possible pairs when my Numbers list size is increasing (like 9900 possible pairs for 100 different numbers). And I add numbers in N during my process so I can't afford to recalculate all possible pairs every time.
Is there an algorithm for generating random unique pairs ?
Maybe it would be faster using matrices or storing my pairs in some sort of a tree graph ?
Upvotes: 0
Views: 895
Reputation: 60918
It depends a lot on what you want to optimize for.
If you want to keep things simple and easy to maintain, having a hash set of all generated numbers sounds reasonable. The assumption here is that both checking membership and adding a new element should be O(1) on average.
If you worry about space requirements, because you regularly use up to 70% of the possible pairs, then you could optimize for space. To do that, I'd first establish a mapping between each possible pair and a single integer. I'd do so in a way that allows for easy addition of more numbers to N.
+0 +1
0 (0,1) (1,0)
2 (0,2) (2,0)
4 (1,2) (2,1)
6 (0,3) (3,0)
8 (1,3) (3,1)
10 (2,3) (3,2)
Something like this would map a single integer i to a pair (a,b) of indices into your sequence N, which you could then look up in N to turn them into an actual pair of elements. You can come up with formulas for this mapping, although the conversion from i to (a,b) will entail a square root somewhere.
When you have this, the task of picking a pair from a set of arbitrary numbers becomes the task of picking an integer from a continuous range of integers. Now you could use a bit map to very efficiently store for each index whether you have already picked that one in the past. For low percentages of picked pairs that bitmap may be more memory-consuming than a hash map of only picked values would be, but as you approach 70% of all pairs getting picked, it will be way more efficient. I would expect a typical hash map entry to consume at least 3×64=192 bit of storage, so the bitmap will start saving memory once 1/192=0.52% of all values are getting picked. Growing the bit map might still be expensive, so estimating the maximal size of N might help allocating enough memory up front.
If you have a costly random number generator, or worry about the worst case time complexity of the whole thing, then you might want to avoid multiple attempts that might result in already picked pairs. To achieve that you would probably store the set of all picked pairs in some kind of search tree where each node also keeps track of how many leafs its subtree contains. That way you could generate a random number in a range that corresponds to the size of pairs that haven't been picked yet, and then use the information in that tree to add to the chosen value the number of all already picked indices smaller than that. I haven't worked out all details but I believe with this it should be possible to turn this into O(log n) worst case time complexity, as opposed to the O(1) average case but O(n) or even O(∞) worst case we had before.
Upvotes: 1