tejas_kale
tejas_kale

Reputation: 603

Understanding Numpy's `multivariate_normal` method

I wish to generate samples from a multivariate Gaussian distribution with 0 mean and a very low standard deviation (0.001). But when I plot the resultant samples, I am confused about their range.

If we look at a random sample generated from a standard Gaussian distribution, it looks fine. A good proportion of samples lie within the (-1, +1) boundary on both axes (the ideal should be 66%?). Besides, all the samples lie in the range of (-3, +3) which seems fine.

mean = np.array([0., 0.])
cov1 = np.array([[1., 0.], [0., 1.]])
size = 100

vals1 = np.random.multivariate_normal(mean, cov1, size)
plt.scatter(vals1[:, 0], vals1[:, 1])

Samples from standard Gaussian distribution

Now when I reduce the standard deviation to 0.001, I expect the samples to be in the range (-0.003, 0.003) but they are an order of magnitude higher. I see them lie in the range (-0.06, 0.06).

cov2 = np.array([[0.001, 0.], [0., 0.001]])

vals2 = np.random.multivariate_normal(mean, cov2, size)
plt.scatter(vals2[:, 0], vals2[:, 1])

Samples from Gaussian distribution with std=0.001

I suppose there is something wrong with the way I am interpreting the range of samples from a multivariate Gaussian. Can anyone help me make sense of these results? Thanks.

Upvotes: 0

Views: 2631

Answers (1)

vanPelt2
vanPelt2

Reputation: 898

If a covariance matrix is diagonal, its diagonal entries are the variances (σ^2) of each variable. So when you have

cov2 = np.array([[0.001, 0.], [0., 0.001]])

you are really choosing the standard deviation (σ) in each variable to be sqrt(0.001) = 0.03162277660168379.

Upvotes: 1

Related Questions