Banad
Banad

Reputation: 11

Mean of normal distribution generated using numpy.random.randn() is not '0'

I am trying to follow this tutorial from quantopian where they are trying to show that samples progressively exhibit characteristics of a normal distribution with increase in size .

I tried to generate a normal distribution using the numpy.random.randn() method as shown in the tutorial.

I understand that this method returns a sample of the standard normal distribution and that for a normal distribution, mean = 0 and standard deviation = 1

But, when I check the mean and standard deviation of this distribution, they show weird values i.e mean = 0.23 and standard deviation = 0.49.

CODE:

import numpy as np
import matplotlib.pyplot as plt
#np.random.seed(123)
normal = np.random.randn(6)

print (normal.mean())
print (normal.std())

RESULT:

0.231567632423
0.488577812058

I am guessing this could be because I am looking at just a sample and not the whole distribution and it is not perfectly normal. But if that is the case:

  1. What characteristics should I expect from this sample?

  2. Isn't the tutorial's suggestion wrong, since it will never be a normal distribution?

Upvotes: 1

Views: 931

Answers (1)

James
James

Reputation: 36598

You have a sample size or 6. It is not sufficiently large enough to get close to approximating the normal distribution. Try it with 600 or 6000 to get a good representation of the distribution.

import numpy as np

x = np.random.randn(600)
x.mean(), x.std()
# returns:
(-0.07760043571247623, 0.9664411074909558)

x = np.random.randn(6000)
x.mean(), x.std()
# returns:
(0.003908119246211815, 1.0001989021750033)

The average roll of a 6-sided die should be 3.5. However, if you only roll it 6 times, it is unlikely you will have an average of 3.5.

Upvotes: 4

Related Questions