Reputation: 153
I wrote the function below to sample without replacement. It returns a vector<int>
representing a sample being picked among some points. As input, I have a vector<double>
containing the probabilities and an integer for the desired sample size. For the function, I use the discrete distribution:
http://www.cplusplus.com/reference/random/discrete_distribution/discrete_distribution/
vector<int> samplingwoutreplacement(vector<double> probs, int samplesize) {
random_device rd;
mt19937 generator(rd());
vector<int> sample;
sample.reserve(samplesize);
for (int i = 0; i < samplesize; i++) {
discrete_distribution<int> distribution(probs.begin(), probs.end());
int currentpick = distribution(generator);
if (currentpick >= probs.size()) {
cout << endl << "error: range overstepped; current pick is: " << currentpick << endl;
cout << "probs.size = " << probs.size() << endl;
for (int j = 0; j < probs.size(); j++) {
cout << probs[j] << endl;
}
}
probs[currentpick] = 0;
sample.push_back(currentpick);
}
return sample;
}
In my application, I used this sampling a lot of times and after a lot of iterations, the discrete distribution returns an integer larger than the size of the vector containing the probabilities. (More precisely, my vector had size 178, and I got as a return 178, but should get something between an integer between 0 and 177.) How can this happen?
Upvotes: 1
Views: 365
Reputation: 12749
In the C++ Standard, we can read about std::discrete_distribution
at 26.6.8.6.1 [rand.dist.samp.discrete] (emphasis mine)
A
discrete_distribution
random number distribution produces random integers i, 0 ≤ i < n, distributed according to the discrete probability function P(i | p0, …, pn−1) = pi.Unless specified otherwise, the distribution parameters are calculated as: pk = wk/S for k = 0, …, n−1, in which the values wk, commonly known as the weights, shall be non-negative, non-NaN, and non-infinity. Moreover, the following relation shall hold: 0 < S = w0 + ⋯ + wn−1.
Some of the weights used by the asker (like 1.29272e+308) are so big that their sum overflows the range of double
, so that the value of S (which become infinity) and the following calculations become meaningless.
I tested the behavior of gcc, clang and MSVC in such corner case and find out that while gcc and clang produce a distribution with all probabilities equal to zero and std::discrete_distribution::operator()
always returns n - 1, MSVC also produces a distribution with all probabilities equal to zero, but the integer returned is always n (an out of bounds value, as experienced by the asker).
To my interpretation, that's not a standard compliant behavior, even if, to their credit, OP's input parameters result in a violation of the preconditions.
Upvotes: 2