Reputation: 11
I am trying to follow this tutorial from quantopian where they are trying to show that samples progressively exhibit characteristics of a normal distribution with increase in size .
I tried to generate a normal distribution using the numpy.random.randn()
method as shown in the tutorial.
I understand that this method returns a sample of the standard normal distribution and that for a normal distribution, mean = 0
and standard deviation = 1
But, when I check the mean and standard deviation of this distribution, they show weird values i.e mean = 0.23
and standard deviation = 0.49
.
CODE:
import numpy as np
import matplotlib.pyplot as plt
#np.random.seed(123)
normal = np.random.randn(6)
print (normal.mean())
print (normal.std())
RESULT:
0.231567632423
0.488577812058
I am guessing this could be because I am looking at just a sample and not the whole distribution and it is not perfectly normal. But if that is the case:
What characteristics should I expect from this sample?
Isn't the tutorial's suggestion wrong, since it will never be a normal distribution?
Upvotes: 1
Views: 931
Reputation: 36598
You have a sample size or 6. It is not sufficiently large enough to get close to approximating the normal distribution. Try it with 600 or 6000 to get a good representation of the distribution.
import numpy as np
x = np.random.randn(600)
x.mean(), x.std()
# returns:
(-0.07760043571247623, 0.9664411074909558)
x = np.random.randn(6000)
x.mean(), x.std()
# returns:
(0.003908119246211815, 1.0001989021750033)
The average roll of a 6-sided die should be 3.5. However, if you only roll it 6 times, it is unlikely you will have an average of 3.5.
Upvotes: 4