Why does the mean output of multivariate_normal method differ from the mean of distribution?

Question

import numpy as np
np.random.seed(12)
num_observations = 5
x1 = np.random.multivariate_normal([1, 1], [[1, .75],[.75, 1]], num_observations)

sum = 0
for i in x1:
    sum += i  

print(sum/num_observations)

In this snippet the output is coming as [ 0.95766788 0.79287083] but shouldn't it be [1,1] as while generating the multivariate distribution I have taken the mean as 1,1?

Brad Solomon · Accepted Answer

What multivariate_normal does is:

Draw random samples from a multivariate normal distribution.

With the key word here being draw. You are basically taking a fairly small sample that is not guaranteed to have the same mean as the distribution itself. (That's the mathematical expectation, nothing more, and your sample size is 5.)

x1.mean(axis=0)
# array([ 0.958,  0.793])

Consider testing this by taking a much larger sample, where the law of large numbers dictates that your means should more reliably approach 1.00000...

x2 = np.random.multivariate_normal([1, 1], [[1, .75],[.75, 1]], 10000)
x2.mean(axis=0)
# array([ 1.001,  1.009])

In other words: say you had a population of 300 million people where the average age was 50. If you randomly picked 5 of them, you would expect your mean of the 5 to be 50, but it probably wouldn't be exactly 50, and might even be significantly far off from 50.

Why does the mean output of multivariate_normal method differ from the mean of distribution?

Answers (1)

Related Questions