Reputation: 32051
I'm drawing a set of 3D Gaussian samples using numpy with zero-mean and unit-variance:
cov = np.zeros((3,3), dtype=np.float32)
np.fill_diagonal(cov, 1.0)
data_values = np.random.multivariate_normal([0.,0.,0.], cov, size=5000) # 5000 x 3
I can plot each dimension and see Gaussians.
I can't plot the full 3D gaussian, so to validate I computed the distance of each sample in data_values
from the origin (0,0,0)
.
dist_from_center = np.sqrt(np.sum((data_values)**2, axis=1)) # array of 5000
When I plot a histogram of the distances I expect to see a half a gaussian, with the mode at zero, but I don't.
Can anyone see the mistake, or explain the result?
Upvotes: 4
Views: 583
Reputation: 114781
The distribution of the distance from the center is not a half-Gaussian. In two-dimensions, for example, the distribution is the Rayleigh distribution (a special case of the Rice distribution).
Here's a quick explanation of what you should expect the distribution to be, using your simple case where the covariance matrix is the identity. Then the PDF of the Gaussian in 3D looks like K*exp(-x.dot(x)/2)
, where K
is 1/(2*pi)**(1.5)
. Rewrite x.dot(x)
as r**2
; r
is the distance from the origin. So the PDF behaves like K*exp(-r**2/2)
.
Now imagine a thin spherical shell around the origin, with radius r
and infinitesimal thickness dr
. The "volume" of this thin shell is approximately 4*pi*r**2*dr
. This entire volume is what must be included in the distribution of the distances from the origin. So we multiply the Gaussian PDF (expressed as a function of r
) by the volume of this spherical shell, and divide by dr
to get the density as a function of r
. This gives (2*r**2)/sqrt(2*pi)*exp(-r**2/2)
. (This distribution is known as the Maxwell-Boltzmann distribution.)
Here's a plot of a histogram of the distances, and that function of r
:
The histogram was generated using
hist(dist_from_center, bins=25, normed=True)
Upvotes: 6