alex
alex

Reputation: 11400

Random pairs from two lists

My question is similar to this one.

I have two lists: X with n elements and Y with m elements - let's say they hold row and column indices for a n x m matrix A. Now, I'd like to write something to k random places in matrix A.

I thought of two solutions:

  1. Get a random element x from X and a random element y from Y. Check if something is already written to A[x][y] and if not - write. But if k is close to m*n I can shoot like this for ever.
  2. Create an m*n array with all possible combinations of indices, shuffle it, draw first k elements and write there. But the problem I see here is that if both n and m are very big, the newly created n*m array may be huge (and shuffling may take some time too).
  3. Karoly Horvath suggested to combine the two. I guess I'd have to pick threshold t and:

.

if( k/(m*n) > t ){
    use option 2.
}else{
    use option 1.
}

Any advice on how to pick t then?

Are there any other (better) approaches I missed?

Upvotes: 2

Views: 1082

Answers (4)

Pandrei
Pandrei

Reputation: 4951

for a nxm matrix, you can consider [0..n*m-1] the indexes for the matrix elements.

Filling in a random index is rather trivial, just generate a random number between 0 and n*m-1, and that is the position to be filled.

Subsequently doing this operation can be a little more tricky:

  1. you can test weather you have already written something to a position and regenerate the random number; but as you fill the matrix you will have a larger number of index regeneration.
  2. a better solution is to put all the indexes in a vector of n*m elements. As you generate an index, you remove it from the list and next time generate a random index between 0 and N-1

example:

vector<int> indexVec;
for (i=0;i<n*m;i++)
    indexVec.push_back(i);
nrOfIndexes = n*m-1;
while (nrOfIndexes>1) 
{
   index = rand()% nrOfIndexes;
   processMatrixLocation(index); 
   indexVec.erase(indexVec.begin()+index);    
   nrOfIndexes--;
}
processMatrixLocation(indexVec[0]);

Upvotes: 0

Ulrich Eckhardt
Ulrich Eckhardt

Reputation: 17415

You have n x m elements, e.g. 200 elements for a 10 x 20 matrix. Picking one out of 200 should be easy. Point is, whatever you do, you can flatten the two dimensions into one, reducing that part of the problem.

Notes:

  • Use floor divide and modulo operations to get row and column out of the index.
  • Blacklist: Store the picked index in a set to quickly skip those that were already written.
  • Whitelist: Store the indices that are not yet picked in a set. If this is better than blacklisting depends on how full your set is.
  • Using the right container type for the set might come important, it doesn't have to be std::set. For the blacklist, you only need fast lookup and fast insertion, a vector<bool> might actually work pretty well. For the whitelist, you need fast random access and fast deletion, a vector<unsigned> with the remaining indices would be a good choice.
  • Prepare to switch between either method depending on the circumstances.

Upvotes: 0

David Eisenstat
David Eisenstat

Reputation: 65458

There's an elegant algorithm due to Floyd for sampling without replacement from a range of integers. You can map the resulting integers in [0, n*m) to coordinates by the C++ function [m](int i) { return std::make_pair(i / m, i % m); }.

Upvotes: 4

Rob
Rob

Reputation: 2656

The best approach depends on how full your resulting matrix will be.. If you are going to fill more than half of it your rate of collision (aka getting random spot that is already "written" to) is going to be high and will cause you to loop a lot more than you would want.

I would not generate all possibilities, but instead I would build it as you go using a lists of lists. One for all possible values of X and from that a list of possible values of Y. I would initialize the X list but not the Y ones. every time you pick a value of x for the first time you create a dictionary or list of m elements, then remove the one you use. then next time you pick x you will have m-1 elements, once a X value runs out of elements then remove it from the list so it does not get picked again.. this way you can guarantee never to pick a occupied space again, and you do not need to generate n*m possible options.

Upvotes: 0

Related Questions