SynAck
SynAck

Reputation: 427

Generate random numbers with equal probabilities in c++

For example, array of pairs of ranges p={[23, 28], [11, 14], [31, 39]}

You have to return random number such that this number should be:

  1. In any one of these pairs
  2. Each number in these ranges should have equal probability of being chosen.

I can easily generate a random number that lies between any of these intervals like

int x=(rand()%(p[i].second-p[i].first) )+ p[i].first;

How can I generate a random number that can lie in any interval in the array and that too with equal probability.

Upvotes: 1

Views: 1177

Answers (3)

jpmarinier
jpmarinier

Reputation: 4733

We could follow a common approach by defining a specialized generator class.

We can overload its () function call operator, so once we have an object of that class, we just get a variate thru variate = genObj(). Underneath, the class is to call standard C++11 (and beyond) random number generation facilities, as exposed for example in the ISO C++ N3551 paper.

Note that rand() and srand() are legacy C facilities.

In this example, there are 19 possible variates. We number them from 0 to 18. First, we generate uniformly a random integer in [0..18]. This is called the rank. Second, from the rank we compute the actual variate; this second step is known as unranking.

The generator class has a drawRank() method which returns a rank.

We define the cumulants of the range set as, for each range, the size of that range plus the sum of the sizes of all ranges before itself.

Once we have determined that the variate for a certain rank lies in the i-th range, unranking can be done like this:

variate = if (i == 0)  then  ranges[0].lowerBound + rank
                       else  ranges[i].lowerBound + (rank - cumulants[i-1])

Side note: as here the ranges are quite small, instead we could unrank with just a simple vector used as a lookup table, translating ranks to variates directly. But if some ranges become large, like [1000000,199999], this approach obviously becomes a memory wasting one.

So we can now proceed to write our specialized generator class.

First, the public part, with the #include clauses and the constructor:

#include  <cstdlib>
#include  <map>
#include  <random>
#include  <iostream>

using Ranges = std::vector<std::pair<int,int>>;

class RangesGen
{
public:
   // Constructor:
   RangesGen(int randomSeed_, const Ranges& ranges_) :
       randomSeed(randomSeed_), ranges(ranges_),
       engine(randomSeed_), ud(0, getTotalCount(ranges_)-1),
       totalCount(getTotalCount(ranges_)),
       lengths(), cumulants()
   {
       for (auto p: ranges) {
           lengths.push_back(1 + p.second - p.first);
           cumulants.push_back(0);  // just to fill it up
       }
       cumulants[0] = lengths[0];
       for (int i=1; i < lengths.size(); i++)
           cumulants[i] = cumulants[i-1] + lengths[i];
   }

The overloaded function call operator, with the unranking logic, is also in the public part of the class:

    // return a random variate:
    int operator()()
    {
        int  rank = drawRank();
        int     i = 0;
        while (cumulants[i] <= rank)
            i++;
        // here  rank < cum[i] - so rank fits in the i-th range
        // unranking logic :
        if (i==0)  
            return (ranges[0].first + rank);
        else
            return ( ranges[i].first + (rank - cumulants[i-1]) );
    }

Second, the private part, with the actual list of fields and the drawRank() private method:

private:
    int                                 randomSeed;
    std::mt19937                        engine;  // raw random number generator
    std::uniform_int_distribution<int>  ud;
    Ranges                              ranges;
    int                                 totalCount;
    bool                                verbose;
    std::vector<int>                    lengths;
    std::vector<int>                    cumulants;

    static int getTotalCount(const Ranges& ranges)
    {
        int   total = 0;
        for (auto p: ranges)
            total += 1 + p.second - p.first;
        return total;
    }

    int drawRank()
    {
        return ud(engine);
    }
};

Test program:

We can write a small test program, which generates 19*2,000,000 = 38 millions variates, and then computes the corresponding histogram. As this is supposed to be an uniform distribution, we expect each variate to get close to 2 millions instances.

int main()
{
    Ranges  myRanges {{23, 28}, {11, 14}, {31, 39}};
    int   randomSeed {424344};
    RangesGen  gen1(randomSeed, myRanges);  // specialized variate generator

    std::map<int,int>  histogram{};  // empty map

    for (int j=0; j < 19*2000000; j++) {
        int  variate = gen1();
        if (histogram.end() == histogram.find(variate))
            histogram[variate] = 0;  // First sight of this variate, insert it.
        int prevCount = histogram[variate];
        histogram[variate] = prevCount + 1;
    }

    // display histogram contents:
    for (auto itr = histogram.begin(); itr != histogram.end(); ++itr) {
        int variate = itr->first;
        int count   = itr->second;
        std::cout << count << " instances of variate " << variate <<'\n';
    }

    return EXIT_SUCCESS;
}

Test program output:

2001500 instances of variate 11
1999233 instances of variate 12
1999842 instances of variate 13
1999205 instances of variate 14
1999467 instances of variate 23
1999725 instances of variate 24
2001476 instances of variate 25
1999592 instances of variate 26
2001036 instances of variate 27
2002098 instances of variate 28
2000420 instances of variate 31
2000231 instances of variate 32
2000569 instances of variate 33
1996539 instances of variate 34
2000371 instances of variate 35
1999316 instances of variate 36
2000151 instances of variate 37
1998338 instances of variate 38
2000891 instances of variate 39

So the histogram is as expected.

Upvotes: 1

Ryan Pepper
Ryan Pepper

Reputation: 93

Generally, you shouldn't use rand and scale by division, because it can lead to some numbers being generated more commonly. There is a very good talk about this 'rand Considered Harmful from MSDN a few years ago. In modern C++ you should probably use the random distribution methods brought in in C++11 in the <random> header. For e.g.:

std::mt19937 generator(0); // Use mersenne twister algorithm with seed 0 - shouldn't use same seed repeatedly but generate this in some way

// set up a uniform distribution:
std::uniform_int_distribution<> distribution(23, 28);

// generate a random number in that range
distribution(generator)

Upvotes: -1

Bathsheba
Bathsheba

Reputation: 234655

Draw a number uniformly in the range [0, 18].

Add 11.

If that number is 15 or more then add 8.

If that number is 29 or more then add 2.

This method has the advantage that it uses only one drawing. Using multiple drawings can cause the statistical properties of the random numbers to deteriorate depending on the properties of the underlying generator. Sampling with rejection can cause issues with some low discrepancy sequences such as Sobol.

Upvotes: 3

Related Questions