Reputation: 603
I wish to generate samples from a multivariate Gaussian distribution with 0
mean and a very low standard deviation (0.001
). But when I plot the resultant samples, I am confused about their range.
If we look at a random sample generated from a standard Gaussian distribution, it looks fine. A good proportion of samples lie within the (-1, +1) boundary on both axes (the ideal should be 66%?). Besides, all the samples lie in the range of (-3, +3) which seems fine.
mean = np.array([0., 0.])
cov1 = np.array([[1., 0.], [0., 1.]])
size = 100
vals1 = np.random.multivariate_normal(mean, cov1, size)
plt.scatter(vals1[:, 0], vals1[:, 1])
Now when I reduce the standard deviation to 0.001
, I expect the samples to be in the range (-0.003, 0.003)
but they are an order of magnitude higher. I see them lie in the range (-0.06, 0.06)
.
cov2 = np.array([[0.001, 0.], [0., 0.001]])
vals2 = np.random.multivariate_normal(mean, cov2, size)
plt.scatter(vals2[:, 0], vals2[:, 1])
I suppose there is something wrong with the way I am interpreting the range of samples from a multivariate Gaussian. Can anyone help me make sense of these results? Thanks.
Upvotes: 0
Views: 2631
Reputation: 898
If a covariance matrix is diagonal, its diagonal entries are the variances (σ^2) of each variable. So when you have
cov2 = np.array([[0.001, 0.], [0., 0.001]])
you are really choosing the standard deviation (σ) in each variable to be sqrt(0.001) = 0.03162277660168379.
Upvotes: 1