Bobesh
Bobesh

Reputation: 1207

Generate random numbers from lognormal distribution in python

I need to generate pseudo-random numbers from a lognormal distribution in Python. The problem is that I am starting from the mode and standard deviation of the lognormal distribution. I don't have the mean or median of the lognormal distribution, nor any of the parameters of the underlying normal distribution.

numpy.random.lognormal takes the mean and standard deviation of the underlying normal distribution. I tried to calculate these from the parameters I have, but wound up with a quartic function. It has a solution, but I hope that there is a more straightforward way to do this.

scipy.stats.lognorm takes parameters that I don't understand. I am not a native English speaker and the documentation doesn't make sense.

Can you help me, please?

Upvotes: 12

Views: 11553

Answers (3)

monade
monade

Reputation: 333

Adding to @WarrenWeckesser excellent answer, here's a function that provides the exact return values to reparametrize a lognormal distribution in terms of the mode and the SD:

import numpy as np
def lognorm_params(mode, stddev):
    a = stddev**2 / mode**2
    x = 1/4*np.sqrt(-(16*(2/3)**(1/3)*a)/(np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3) +
                    2*(2/3)**(2/3)*(np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3)+1) + \
        1/2*np.sqrt((4*(2/3)**(1/3)*a)/(np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3) -
                    (np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3)/(2**(1/3)*3**(2/3)) +
                    1/(2*np.sqrt(-(16*(2/3)**(1/3)*a)/(np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3) +
                                 2*(2/3)**(2/3)*(np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3)+1))+1/2) + \
        1/4
    shape = np.sqrt(np.log(x))
    scale = mode * x
    return shape, scale

Essentially, I just computed the exact solution of the quartic. The advantages are that the solution is a) exact, b) faster and c) vectorizable. As in the case of the answer by @WarrenWeckesser, this function returns, for a given mode and SD, the parameters shape and scale as used by the scipy function scipy.stats.lognormal().

Upvotes: 1

Warren Weckesser
Warren Weckesser

Reputation: 114956

You have the mode and the standard deviation of the log-normal distribution. To use the rvs() method of scipy's lognorm, you have to parameterize the distribution in terms of the shape parameter s, which is the standard deviation sigma of the underlying normal distribution, and the scale, which is exp(mu), where mu is the mean of the underlying distribution.

You pointed out that making this reparameterization requires solving a quartic polynomial. For that, we can use the numpy.poly1d class. Instances of that class have a roots attribute.

A little algebra shows that exp(sigma**2) is the unique positive real root of the polynomial

x**4 - x**3 - (stddev/mode)**2 = 0

where stddev and mode are the given standard deviation and mode of the log-normal distribution, and for that solution, the scale (i.e. exp(mu)) is

scale = mode*x

Here's a function that converts the mode and standard deviation to the shape and scale:

def lognorm_params(mode, stddev):
    """
    Given the mode and std. dev. of the log-normal distribution, this function
    returns the shape and scale parameters for scipy's parameterization of the
    distribution.
    """
    p = np.poly1d([1, -1, 0, 0, -(stddev/mode)**2])
    r = p.roots
    sol = r[(r.imag == 0) & (r.real > 0)].real
    shape = np.sqrt(np.log(sol))
    scale = mode * sol
    return shape, scale

For example,

In [155]: mode = 123

In [156]: stddev = 99

In [157]: sigma, scale = lognorm_params(mode, stddev)

Generate a sample using the computed parameters:

In [158]: from scipy.stats import lognorm

In [159]: sample = lognorm.rvs(sigma, 0, scale, size=1000000)

Here's the standard deviation of the sample:

In [160]: np.std(sample)
Out[160]: 99.12048952171304

And here's some matplotlib code to plot a histogram of the sample, with a vertical line drawn at the mode of the distribution from which the sample was drawn:

In [176]: tmp = plt.hist(sample, normed=True, bins=1000, alpha=0.6, color='c', ec='c')

In [177]: plt.xlim(0, 600)
Out[177]: (0, 600)

In [178]: plt.axvline(mode)
Out[178]: <matplotlib.lines.Line2D at 0x12c5a12e8>

The histogram:

histogram


If you want to generate the sample using numpy.random.lognormal() instead of scipy.stats.lognorm.rvs(), you can do this:

In [200]: sigma, scale = lognorm_params(mode, stddev)

In [201]: mu = np.log(scale)

In [202]: sample = np.random.lognormal(mu, sigma, size=1000000)

In [203]: np.std(sample)
Out[203]: 99.078297384090902

I haven't looked into how robust poly1d's roots algorithm is, so be sure to test for a wide range of possible input values. Alternatively, you can use a solver from scipy to solve the above polynomial for x. You can bound the solution using:

max(sqrt(stddev/mode), 1) <= x <= sqrt(stddev/mode) + 1

Upvotes: 17

zwol
zwol

Reputation: 140856

The log-normal distribution is (confusingly) the result of applying the exponential function to a normal distribution. Wikipedia gives the relationship between the parameters as

mu = log(m/sqrt(1 + v/m^2)), sigma = sqrt(log(1 + v/m^2))

where μ and σ are the mean and standard deviation of what you call the "underlying normal distribution", and m and v are the mean and variance of the log-normal distribution.

Now, what you say you have is the mode and standard deviation of the log-normal distribution. The variance v is just the square of the standard deviation. Getting from the mode to m is trickier: again quoting that Wikipedia article, if the mean is exp(mu + sigma^2/2) then the mode is exp(mu - sigma^2). From this, and the above, we can deduce that

log m = log n + 3/2 log (1 + v/m^2)

where n is the mode of the log-normal distribution and v, m are as above. This reduces to a quartic,

m^8 = n^2m^6 + 3vn^2m^4 + 3n^2v^2m^2 + n^2v^3

or

u^4 - n^2u^3 - 3vn^2u^2 - 3n^2v^2u - n^2v^3 = 0

where u = m2. I suspect this is the same quartic you mentioned in your question. It can be solved, but like most quartics, the radical form of the solutions are a giant hairball. The most practical approach for your purposes might be to plug numeric values for n and v into the above and then use a numeric solver to find the positive root(s).

Sorry I can't be more help. This is really a math question, not a programming question; you might get more helpful answers on https://math.stackexchange.com/.

Upvotes: 1

Related Questions