Reputation: 119
I' running a small Monte Carlo test, where n is the number of i.i.d. generated and R are the iterations. This is my code:
R=50
n=100
lambda=30
v<-c()
for (i in 1:R){
u <- rpois(n, lambda)
v[i] <- n*mean(u)
}
hist(v, breaks=30, main = "")
curve(dnorm(n*x, n*lambda, sd = n*lambda), col="Red", add = TRUE)
We indeed now that the sum of n independendent Poisson's of parameter lambda is a Poisson of parameter n times lambda. Hence by the central limit in makes sense to approximate their sum with a normal of mean and variance equal to n*lambda. But whenever I try to plot curve(dnorm), R draws it as if it was constantly 0, and I don't get why is that so.
Upvotes: 0
Views: 412
Reputation: 1414
I would say that the problem is more fundamental than the (correct) finding by Érico Patto, namely the line of code:
curve(dnorm(n*x, n*lambda, sd = n*lambda), col="Red", add = TRUE)
is not the appropriate one to overlay on the histogram because of three errors:
dnorm()
.I now explain each of these points in detail.
1) Standard deviation: Your line of code states that the standard deviation of the normal is n*lambda
while this expression gives the variance of the sum of n
independent Poisson of parameter lambda
.
2) First argument of dnorm()
: The first argument should be x
, not n*x
as the first argument represents the values on which the dnorm()
function should be evaluated. These values are by default 101 equally spaced points ranging from the smallest x value to the largest x value of the plot on which the curve is added (since you are using add=TRUE
--c.f. documentation of curve
). In this case, this is the range of X-axis of the histogram.
3) Vertical scale: In order for the density curve to be visible when laid over the histogram, the vertical scale of the histogram should be "compatible" with the vertical scale of the density. Since your histogram plots the FREQUENCY on the vertical scale, its values are much larger than the values of the density curve (which are in the thousandth scale); so a scale of 1/1000 is overlaid over a scale of 1 and therefore the curve is not visible.
That said, the last two lines of code that should be used to achieve your goal of overlaying the density of the limiting normal distribution of the sum of n
independent Poisson random variables with parameter lambda
on its histogram is:
hist(v, breaks = 30, main = "", freq = FALSE)
curve(dnorm(x, mean = n*lambda, sd = sqrt(n*lambda)), col = "Red", add = TRUE)
which has three differences w.r.t. to your code, and gives the following picture (for a particular set of randomly-generated values in vector v
):
And if you use R=500
replications instead of your 100 replications you get a better approximation by the normal:
Upvotes: 1
Reputation: 1015
It's rather simple, actually. Your cure is non-zero between -100 and 150 (I discovered that empirically, so it's not very accurate), and your histogram goes from 2850 to 3100.
This is your curve:
curve(dnorm(n*x, n*lambda, sd = n*lambda), from = -100, to = 150, col="Red")
And this is your histogram:
hist(v, breaks=30, main = "", freq = FALSE)
Upvotes: 0