Reputation: 1
The scipy.stats function 'multivariate_normal' doesn't produce the desired distribution. I am using the multivariate normal to allow for generality as part of a physics project, but the distribution of the z-component of the vector is not as expected, even when the standard deviation matrix only has a z-component, as below:
from scipy import stats
import numpy as np
mu = np.array([0,0,2000])
sigma = np.diag([0,0,50])
vals = stats.multivariate_normal(mu, sigma, allow_singular=True).rvs(10000)
does not produce the desired distribution in vals[2] as expected, a normal distribution with standard deviation of 50 and mean of 2000. Instead the mean is correct, but the standard deviation is much smaller than it should be. What am I doing wrong here?
Upvotes: 0
Views: 32
Reputation: 13491
Parameter of multivariate_normal
are mean and covariance. In your case, with diagonal covariance, that is pragmatically mean and variance of your 3 columns.
So, variance of 3rd column you get is 50 (covariance of a variable with itself is its variance). So you should expect a normal distribution with a variance of 50, and a standard deviation of √50
print(vals[:,2].var(), vals[:,2].std())
# 50.59960532406098 7.113339955608827
plt.hist(vals[:,2], density=True, bins=50)
x=np.linspace(1960, 2040, 100)
plt.plot(x, stats.norm(2000, np.sqrt(50)).pdf(x))
So, seems fine to me. Just, for a single variable normal it is quite usual to specify standard deviation (and scipy.stats, anyway, tends to favor loc/scale type of parameters when suited, and for a normal law, loc is mean, and scale is stdev).
But covariance (on the diagonal) are variances, not standard deviation. And it wouldn't be possible to use the square root of that for sake of uniformity, since, outside the diagonal, there is no reason to think that the covariance even has a square root.
Upvotes: 0