Reputation: 1207
I need to generate pseudo-random numbers from a lognormal distribution in Python. The problem is that I am starting from the mode and standard deviation of the lognormal distribution. I don't have the mean or median of the lognormal distribution, nor any of the parameters of the underlying normal distribution.
numpy.random.lognormal
takes the mean and standard deviation of the underlying normal distribution. I tried to calculate these from the parameters I have, but wound up with a quartic function. It has a solution, but I hope that there is a more straightforward way to do this.
scipy.stats.lognorm
takes parameters that I don't understand. I am not a native English speaker and the documentation doesn't make sense.
Can you help me, please?
Upvotes: 12
Views: 11553
Reputation: 333
Adding to @WarrenWeckesser excellent answer, here's a function that provides the exact return values to reparametrize a lognormal distribution in terms of the mode and the SD:
import numpy as np
def lognorm_params(mode, stddev):
a = stddev**2 / mode**2
x = 1/4*np.sqrt(-(16*(2/3)**(1/3)*a)/(np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3) +
2*(2/3)**(2/3)*(np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3)+1) + \
1/2*np.sqrt((4*(2/3)**(1/3)*a)/(np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3) -
(np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3)/(2**(1/3)*3**(2/3)) +
1/(2*np.sqrt(-(16*(2/3)**(1/3)*a)/(np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3) +
2*(2/3)**(2/3)*(np.sqrt(3)*np.sqrt(256*a**3+27*a**2)-9*a)**(1/3)+1))+1/2) + \
1/4
shape = np.sqrt(np.log(x))
scale = mode * x
return shape, scale
Essentially, I just computed the exact solution of the quartic. The advantages are that the solution is a) exact, b) faster and c) vectorizable. As in the case of the answer by @WarrenWeckesser, this function returns, for a given mode and SD, the parameters shape and scale as used by the scipy function scipy.stats.lognormal().
Upvotes: 1
Reputation: 114956
You have the mode and the standard deviation of the log-normal distribution. To use the rvs()
method of scipy's lognorm
, you have to parameterize the distribution in terms of the shape parameter s
, which is the standard deviation sigma
of the underlying normal distribution, and the scale
, which is exp(mu)
, where mu
is the mean of the underlying distribution.
You pointed out that making this reparameterization requires solving a quartic polynomial. For that, we can use the numpy.poly1d
class. Instances of that class have a roots
attribute.
A little algebra shows that exp(sigma**2)
is the unique positive real root of the polynomial
x**4 - x**3 - (stddev/mode)**2 = 0
where stddev
and mode
are the given standard deviation and mode of the log-normal distribution, and for that solution, the scale
(i.e. exp(mu)
) is
scale = mode*x
Here's a function that converts the mode and standard deviation to the shape and scale:
def lognorm_params(mode, stddev):
"""
Given the mode and std. dev. of the log-normal distribution, this function
returns the shape and scale parameters for scipy's parameterization of the
distribution.
"""
p = np.poly1d([1, -1, 0, 0, -(stddev/mode)**2])
r = p.roots
sol = r[(r.imag == 0) & (r.real > 0)].real
shape = np.sqrt(np.log(sol))
scale = mode * sol
return shape, scale
For example,
In [155]: mode = 123
In [156]: stddev = 99
In [157]: sigma, scale = lognorm_params(mode, stddev)
Generate a sample using the computed parameters:
In [158]: from scipy.stats import lognorm
In [159]: sample = lognorm.rvs(sigma, 0, scale, size=1000000)
Here's the standard deviation of the sample:
In [160]: np.std(sample)
Out[160]: 99.12048952171304
And here's some matplotlib code to plot a histogram of the sample, with a vertical line drawn at the mode of the distribution from which the sample was drawn:
In [176]: tmp = plt.hist(sample, normed=True, bins=1000, alpha=0.6, color='c', ec='c')
In [177]: plt.xlim(0, 600)
Out[177]: (0, 600)
In [178]: plt.axvline(mode)
Out[178]: <matplotlib.lines.Line2D at 0x12c5a12e8>
The histogram:
If you want to generate the sample using numpy.random.lognormal()
instead of scipy.stats.lognorm.rvs()
, you can do this:
In [200]: sigma, scale = lognorm_params(mode, stddev)
In [201]: mu = np.log(scale)
In [202]: sample = np.random.lognormal(mu, sigma, size=1000000)
In [203]: np.std(sample)
Out[203]: 99.078297384090902
I haven't looked into how robust poly1d
's roots
algorithm is, so be sure to test for a wide range of possible input values. Alternatively, you can use a solver from scipy to solve the above polynomial for x
. You can bound the solution using:
max(sqrt(stddev/mode), 1) <= x <= sqrt(stddev/mode) + 1
Upvotes: 17
Reputation: 140856
The log-normal distribution is (confusingly) the result of applying the exponential function to a normal distribution. Wikipedia gives the relationship between the parameters as
where μ and σ are the mean and standard deviation of what you call the "underlying normal distribution", and m and v are the mean and variance of the log-normal distribution.
Now, what you say you have is the mode and standard deviation of the log-normal distribution. The variance v is just the square of the standard deviation. Getting from the mode to m is trickier: again quoting that Wikipedia article, if the mean is then the mode is
. From this, and the above, we can deduce that
where n is the mode of the log-normal distribution and v, m are as above. This reduces to a quartic,
or
where u = m2. I suspect this is the same quartic you mentioned in your question. It can be solved, but like most quartics, the radical form of the solutions are a giant hairball. The most practical approach for your purposes might be to plug numeric values for n and v into the above and then use a numeric solver to find the positive root(s).
Sorry I can't be more help. This is really a math question, not a programming question; you might get more helpful answers on https://math.stackexchange.com/.
Upvotes: 1