MPB94
MPB94

Reputation: 153

discrete distribution returns integers out of bound

I wrote the function below to sample without replacement. It returns a vector<int> representing a sample being picked among some points. As input, I have a vector<double> containing the probabilities and an integer for the desired sample size. For the function, I use the discrete distribution:

http://www.cplusplus.com/reference/random/discrete_distribution/discrete_distribution/

vector<int> samplingwoutreplacement(vector<double> probs, int samplesize) {
    random_device rd;
    mt19937 generator(rd());
    vector<int> sample;
    sample.reserve(samplesize);
    for (int i = 0; i < samplesize; i++) {
        discrete_distribution<int> distribution(probs.begin(), probs.end());
        int currentpick = distribution(generator);
        if (currentpick >= probs.size()) {
            cout  << endl << "error: range overstepped; current pick is: " << currentpick << endl;
            cout << "probs.size = " << probs.size() << endl;
            for (int j = 0; j < probs.size(); j++) {
                cout << probs[j] << endl;
            }
        }
        probs[currentpick] = 0;
        sample.push_back(currentpick);
    }
    return sample;
}

In my application, I used this sampling a lot of times and after a lot of iterations, the discrete distribution returns an integer larger than the size of the vector containing the probabilities. (More precisely, my vector had size 178, and I got as a return 178, but should get something between an integer between 0 and 177.) How can this happen?

Upvotes: 1

Views: 365

Answers (1)

Bob__
Bob__

Reputation: 12749

In the C++ Standard, we can read about std::discrete_distribution at 26.6.8.6.1 [rand.dist.samp.discrete] (emphasis mine)

A discrete_­distribution random number distribution produces random integers i, 0 ≤ i < n, distributed according to the discrete probability function P(i | p0, …, pn−1) = pi.

Unless specified otherwise, the distribution parameters are calculated as: pk = wk/S for k = 0, …, n−1, in which the values wk, commonly known as the weights, shall be non-negative, non-NaN, and non-infinity. Moreover, the following relation shall hold: 0 < S = w0 + ⋯ + wn−1.

Some of the weights used by the asker (like 1.29272e+308) are so big that their sum overflows the range of double, so that the value of S (which become infinity) and the following calculations become meaningless.

I tested the behavior of gcc, clang and MSVC in such corner case and find out that while gcc and clang produce a distribution with all probabilities equal to zero and std::discrete_distribution::operator() always returns n - 1, MSVC also produces a distribution with all probabilities equal to zero, but the integer returned is always n (an out of bounds value, as experienced by the asker).

To my interpretation, that's not a standard compliant behavior, even if, to their credit, OP's input parameters result in a violation of the preconditions.

Upvotes: 2

Related Questions