Reputation: 546
In an interview, I was given the following problem to solve initially using pen/paper, then via a program to verify the result.
The question is as follows:
There are three people A,B and C. Each person is capable of hitting a target with a probability of 6/7, 4/5 and 3/4 respectively. What is the probability that if they were to each fire one shot that exactly two of them will hit the target?
The answer is:
P(...) = P(A)*P(B)*(1-P(C)) +
P(B)*P(C)*(1-P(A)) +
P(C)*P(A)*(1-P(B))
= 27.0/70.0
= 38.57142857142857142857142857142857142857....%
Below is my solution to the problem:
#include <cstdio>
#include <cctype>
#include <ctime>
#include <random>
int main()
{
std::mt19937 engine(time(0));
engine.discard(10000000);
std::uniform_real_distribution<double> uniform_real(0.0,1.0);
double prA = (6.0 / 7.0);
double prB = (4.0 / 5.0);
double prC = (3.0 / 4.0);
std::size_t trails = 4000000000;
std::size_t total_success = 0;
for (std::size_t i = 0; i < trails; ++i)
{
int current_success = 0;
if (uniform_real(engine) < prA) ++current_success;
if (uniform_real(engine) < prB) ++current_success;
if (uniform_real(engine) < prC) ++current_success;
if (current_success == 2)
++total_success;
double prob = (total_success * 1.0) / (i+1);
if ((i % 1000000) == 0)
{
printf("%05d Pr(...) = %12.10f error:%15.13f\n",
i,
prob,
std::abs((27.0/70.0) - prob));
}
}
return 0;
}
The problem is as follows, regardless of how large a number of trials I run, the probability flat-lines at around roughly 0.3857002101. Is there something wrong in the code?
The interviewer said it is trivial to get the result to converge to around 9 decimal place accuracy within 1 million trials, regardless of the seed.
Any ideas on where the bug is in my code?
UPDATE 1: I've tried the above code with the following generators, they all seem to platau at around the same time roughly trial 10^9.
UPDATE 2: Thinking about the problem, I've gone down the following track. The ratio 27/70 comprised of 27 and 70 which are both coprime and where factors of 70 under 4x10^9 is roughly 57x10^6 or about 1.4% of all the numbers. Hence the probability of getting the "exact" ratio of 27/70 out of two numbers selected at random between [0,4x10^9] is roughly 1.4% (as there are more factors of 27 within 4x10^9) - So getting the exact ratio is very low, and that number will be constant regardless of the number of trials.
Now if one where to talk about thick bounds - ie: numbers in the range of factors of 70 +/5, that increases the probability of choosing a pair of numbers at random within the range [0,4x10^9] that would give a ratio within specified/related tolerance to about roughly 14%, but with this technique the best we can get will be on average roughly 5 decimal places accurate when compared to the exact value. Is this way of reasoning correct?
Upvotes: 19
Views: 552
Reputation: 18675
because the probabilities are given as rational numbers (with small integers in the denominator), you can view the possible situations as a cube of dimensions 7x5x4 (which makes 140 (product of the denominators) subcubes). Rather than jumping around randomly, you can visit each subcube explicitly as follows and get the exact number in 140 iterations:
#include <cstdio>
#include <cctype>
#include <ctime>
#include <random>
int main()
{
std::size_t total_success = 0, num_trials = 0;
for (unsigned a = 1; a <= 7; ++a)
{
unsigned success_a = 0;
if (a <= 6)
// a hits 6 out of 7 times
success_a = 1;
for (unsigned b = 1; b <= 5; ++b)
{
unsigned success_b = 0;
if (b <= 4)
// b hits 4 out of 5 times
success_b = 1;
for (unsigned c = 1; c <= 4; ++c)
{
unsigned success_c = 0;
// c hits 3 out of 4 times
if (c <= 3)
success_c = 1;
// count cases where exactly two of them hit
if (success_a + success_b + success_c == 2)
++total_success;
++num_trials;
} // loop over c
} // loop over b
} // loop over a
double prob = (total_success * 1.0) / num_trials;
printf("Pr(...) = %12.10f error:%15.13f\n",
prob,
std::abs((27.0/70.0) - prob));
return 0;
}
Upvotes: 3
Reputation: 14205
Monte Carlo methods tend to converge slowly --- the error you expect after n simulations is proportional to 1/sqrt(n). Five digits of accuracy after 10^9 trials seems about right, actually. There's no numerical voodoo going on here.
If the interviewer was talking about straight-up Monte Carlo sampling as you've done, it's...implausible that he could get nine digits of accuracy after a million trials.
Upvotes: 8
Reputation: 19601
FWIW the following Java seems to be converging on the predicted answer from above at about the rate you would expect (it calculates the standard deviation of the worst case error)
import java.util.Random;
import java.security.SecureRandom;
/** from question in Stack Overflow */
public class SoProb
{
public static void main(String[] s)
{
long seed = 42;
/*
In an interview, I was given the following problem to solve initially using pen/paper, then via a program to verify the result.
The question is as follows:
There are three people A,B and C. Each person is capable of hitting a target with a probability of 6/7, 4/5 and 3/4 respectively. What is the probability that if they were to each fire one shot that exactly two of them will hit the target?
The answer is:
P(...) = P(A)*P(B)*(1-P(C)) +
P(B)*P(C)*(1-P(A)) +
P(C)*P(A)*(1-P(B))
= 27.0/70.0
= 38.57142857142857142857142857142857142857....%
Below is my solution to the problem:
*/
/*
int main()
{
std::mt19937 engine(time(0));
*/
Random r = new Random(seed);
// Random r = new SecureRandom(new byte[] {(byte)seed});
// std::uniform_real_distribution<double> uniform_real(0.0,1.0);
double prA = (6.0 / 7.0);
double prB = (4.0 / 5.0);
double prC = (3.0 / 4.0);
// double prB = (6.0 / 7.0);
// double prC = (4.0 / 5.0);
// double prA = (3.0 / 4.0);
double pp = prA*prB*(1-prC) +
prB*prC*(1-prA) +
prC*prA*(1-prB);
System.out.println("Pp " + pp);
System.out.println("2870 " + (27.0 / 70.0));
// std::size_t trails = 4000000000;
int trails = Integer.MAX_VALUE;
// std::size_t total_success = 0;
int total_success = 0;
int aCount = 0;
int bCount = 0;
int cCount = 0;
int pat3 = 0; // A, B
int pat5 = 0; // A, C
int pat6 = 0; // B, C
double pat3Prob = prA * prB * (1.0 - prC);
double pat5Prob = prA * prC * (1.0 - prB);
double pat6Prob = prC * prB * (1.0 - prA);
System.out.println("Total pats " +
(pat3Prob + pat5Prob + pat6Prob));
for (int i = 0; i < trails; ++i)
{
int current_success = 0;
// if (uniform_real(engine) < prA) ++current_success;
int pat = 0;
if (r.nextDouble() < prA)
{
++current_success;
aCount++;
pat += 1;
}
// if (uniform_real(engine) < prB) ++current_success;
if (r.nextDouble() < prB)
{
++current_success;
bCount++;
pat += 2;
}
// if (uniform_real(engine) < prC) ++current_success;
if (r.nextDouble() < prC)
{
++current_success;
cCount++;
pat += 4;
}
switch (pat)
{
case 3:
pat3++;
break;
case 5:
pat5++;
break;
case 6:
pat6++;
break;
}
if (current_success == 2)
++total_success;
double prob = (total_success + 1.0) / (i+2);
if ((i % 1000000) == 0)
{
/*
printf("%05d Pr(...) = %12.10f error:%15.13f\n",
i,
prob,
std::abs((27.0/70.0) - prob));
*/
System.out.println(i + "P rob = " + prob +
" error " + Math.abs((27.0 / 70.0) - prob));
Double maxVar = 0.25 / i;
System.out.println("Max stddev " + Math.sqrt(maxVar));
double ap = (aCount + 1.0) / (i + 2.0);
double bp = (bCount + 1.0) / (i + 2.0);
double cp = (cCount + 1.0) / (i + 2.0);
System.out.println("A error " + (ap - prA));
System.out.println("B error " + (bp - prB));
System.out.println("C error " + (cp - prC));
double p3Prob = (pat3 + 1.0) / (i + 2.0);
double p5Prob = (pat5 + 1.0) / (i + 2.0);
double p6Prob = (pat6 + 1.0) / (i + 2.0);
System.out.println("P3 error " + (p3Prob - pat3Prob));
System.out.println("P5 error " + (p5Prob - pat5Prob));
System.out.println("P6 error " + (p6Prob - pat6Prob));
System.out.println("Pats " + (pat3 + pat5 + pat6) +
" success " + total_success);
}
}
}
}
Current output:
1099000000P rob = 0.3857148864682168 error 6.00753931045972E-7
Max stddev 1.508242443516904E-5
A error -2.2208501193610175E-6
B error 1.4871155568862982E-5
C error 1.0978161945063292E-6
P3 error -1.4134927830977695E-7
P5 error -5.363291293969397E-6
P6 error 6.1072143395513034E-6
Pats 423900660 success 423900660
Upvotes: 2
Reputation: 26040
Firstly, some elementary math shows that it is not possible to get 9 places of precision with only a million trials. Given our probability is 27/70
, we can calculate x/1000000 = 27/70
which gives x = 385714.28571
. If we had a very, very accurate uniform random number generator which generated exactly 385714 correct trials, this would give us an error of approximately abs(385714/1000000 - 0.38571428571428573) = 2.857142857304318e-07
which is well off the wanted 9 places of precision.
I don't think your analysis is correct. Given a very accurate distribution, it is certainly possible to get the required precision. However, -any- skewness from uniformity in the distribution will severely hamper the precision. If we do 1 billion trials, the best precision we can hope for is around 2.85 * 10^-10
. If the distribution is skewed by even 100, this will be knocked down to about 1 * 10^-7
. I'm not sure of the accuracy of most PRNG distributions, but the problem will be having one which is accurate to this degree. Having a quick play with std::uniform_real_distribution<double>(0.0, 1.0)
, it looks certain to have more variance than this.
Upvotes: 16
Reputation: 241691
The interviewer said it is trivial to get the result to converge to around 9 decimal place accuracy within 1 million trials, regardless of the seed.
Well, that's just obviously ridiculous. You cannot get an estimate within one in a thousand million with a million trials. If the total was only one different from the theoretical value, you'd be off by one in a million, which is a thousand times bigger than "9 decimal places".
By the way, c++11 comes with a perfectly good uniform_int_distribution function, which actually handles round-off correctly: it divides the total range of the uniform generator into an exact multiple of the desired range and a remainder, and discards values generated in the remainder, so the values generated are not biased by round-off. I made a slight modification to your test program, and it does converge to six digits in a billion trials, which is about what I'd expect:
int main() {
std::mt19937 engine(time(0));
std::uniform_int_distribution<int> a_distr(0,6);
std::uniform_int_distribution<int> b_distr(0,4);
std::uniform_int_distribution<int> c_distr(0,3);
std::size_t trials = 4000000000;
std::size_t total_success = 0;
for (std::size_t i = 1; i <= trials; ++i) {
int current_success = 0;
if (a_distr(engine)) ++current_success;
if (b_distr(engine)) ++current_success;
if (c_distr(engine)) ++current_success;
if (current_success == 2) ++total_success;
if ((i % 1000000) == 0) {
printf("%05d Pr(...) = %12.10f error:%15.13f\n",
i,
double(total_success) / i,
std::abs((27.0/70.0) - double(total_success) / i));
}
}
}
return 0;
Upvotes: 8