Vytautas
Vytautas

Reputation: 177

Python Numpy Random Numbers - inconsistent?

I am trying to generate log-normally distributed random numbers in python (for later MC simulation), and I find the results to be quite inconsistent when parameters are a bit larger.

Below I am generating a series of LogNormals from Normals (and then using Exp) and directly from LogNormals. The resulting means are bearable, but the variances - quite imprecise.. this also holds for mu = 4,5,...

If you re-run the below code a couple of times - the results come back quite different.

Code:

import numpy as np
mu = 10;
tmp1 = np.random.normal(loc=-mu, scale=np.sqrt(mu*2),size=1e7)
tmp1 = np.exp(tmp1)
print tmp1.mean(), tmp1.var()
tmp2 = np.random.lognormal(mean=-mu, sigma=np.sqrt(mu*2), size=1e7)
print tmp2.mean(), tmp2.var()
print 'True Mean:', np.exp(0), 'True Var:',(np.exp(mu*2)-1)

Any advice how to fix this? I've tried this also on Wakari.io - so the result is consistent there as well

Update: I've taken the 'True' Mean and Variance formula from Wikipedia: https://en.wikipedia.org/wiki/Log-normal_distribution

Snapshots of results: 1)

0.798301881219 57161.0894726
1.32976988569 2651578.69947
True Mean: 1.0 True Var: 485165194.41

2)

1.20346203176 315782.004309
0.967106664211 408888.403175
True Mean: 1.0 True Var: 485165194.41

3) Last one with n=1e8 random numbers

1.17719369919 2821978.59163
0.913827160458 338931.343819
True Mean: 1.0 True Var: 485165194.41

Upvotes: 1

Views: 855

Answers (2)

Jblasco
Jblasco

Reputation: 3967

Ok, as you have just built the sample, and using the notation in wikipedia (first section, mu and sigma) and the example given by you:

from numpy import log, exp, sqrt
import numpy as np
mu = -10
scale = sqrt(2*10)   # scale is sigma, not variance
tmp1 = np.random.normal(loc=mu, scale=scale, size=1e8)
# Just checking
print tmp1.mean(), tmp1.std()
# 10.0011028634 4.47048010775, perfectly accurate
tmp1_exp = exp(tmp1)    # Not sensible to use the same name for two samples
# WIKIPEDIA NOTATION!
m = tmp1_exp.mean()     # until proven wrong, this is a meassure of the mean
v = tmp1_exp.var()  # again, until proven wrong, this is sigma**2
#Now, according to wikipedia
print "This: ", log(m**2/sqrt(v+m**2)), "should be similar to", mu
# I get This:  13.9983309499 should be similar to 10
print "And this:", sqrt(log(1+v/m**2)), "should be similar to", scale
# I get And this: 3.39421327037 should be similar to 4.472135955

So, even if the values are not exactly perfect, I wouldn't claim that they are completely wrong.

Upvotes: 1

Robert Kern
Robert Kern

Reputation: 13430

Even with the large sample size that you have, with these parameters, the estimated variance is going to change wildly from run to run. That's just the nature of the fat-tailed lognormal distribution. Try running the np.exp(np.random.normal(...)).var() several times. You will see a similar swing of values as np.random.lognormal(...).var().

In any case, np.random.lognormal() is just implemented as np.exp(np.random.normal()) (well, the C equivalent).

Upvotes: 6

Related Questions