Reputation: 141
I want to make a montecarlo simulation on which I generate 10 scenarios, each of them is characterized by a random number of arrivals in a time horizon.
I use the scipy.stats.poisson
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.poisson.html
to generate the samples of arrivals for each scenario, assuming that the mean is 12.
from scipy.stats import poisson
arrivals = poisson.rvs(12, 10)
print arrivals
The output is a list of random numbers:
[11 13 9 10 8 9 13 12 11 23]
The mean is 11.9 which is good enough, but the problem is that in this case, in the last scenario there are 23 arrivals which is far from the mean 12.
Since before running this simulation I had to select a population, I have to make the size of that population large enough to comply with the Poisson Random Variates. So let's say that I select a population with size 1.5 * 12 = 18, unfortunately in the last scenario I will get an error since the sample is larger than the population itself.
My first question is: which is the minimum size of the population that I have to select in order to sample these arrivals with a list of Poisson Random Variates, without getting an error?
My second question is: is there a better way to manage this kind of problem by using another probabilistic distribution?
Please note that in this case mean=12 but I have to simulate other contexts on which mean=57 and mean=234.
Upvotes: 1
Views: 567
Reputation: 3824
I have to make the size of that population large enough to comply with the Poisson Random Variates
The Poisson distribution is defined on all non-negative integers (form 0 to infinity). In theory, if you generate numbers from that distribution, you should expect to get any positive integer but those far away from the mean (lambda) has a low probability to appear. For example, the probability of getting a value of 18 or higher using a lambda parameter of 12 is 3.7%:
>>> poisson.sf(18,12)
0.037416489663391859
Thus, if you want to know what is the minimal size you need to use to get an 1% of errors during the simulations, you can use the inverse:
>>> poisson.isf(0.01,12)
21.0
Lambda is the mean of arrivals during a period of time, not the maximum value (the size of the population). I guess, the simulation code can't be changed to use the max value from your sample.
Poisson distribution seems sensible for your case. However, if you want a distribution that reflects the maximum N from your size population, you could adjust the parameters of a more flexible one like the Beta-binomial distribution. My suggestion is to look for real data of your phenomena and then adjust or derive a probability function from it. An even simpler solution is to bootstrap from it by picking values randomly. For statistical questions, you are encouraged to use Cross validated
Upvotes: 2