Reputation: 920
I am generating a random log-normal distribution using the inbuilt function. Is it possible to specify a range for a given mean
and sigma
? For instance, I want a random distribution between 0
and 0.5
with mean=0.2, sigma=0.5
?
import numpy as np
r=np.random.lognormal(mean=0.2, sigma=0.5, size=(3,3))
print([r])
Upvotes: 0
Views: 1077
Reputation: 26211
Not directly, but you can generate more points and filter those that are outside your parameters. With the values you provided, however, there will be very few such points:
np.random.seed(0)
z = np.random.lognormal(mean=0.2, sigma=0.5, size=10_000)
>>> np.sum(z < 0.5) / len(z)
0.0346
As a side note, please know that the parameters mean
and sigma
of np.random.lognormal()
refer to the underlying normal distribution, not to the mean
and std
of the log-normal points:
np.random.seed(0)
z = np.random.lognormal(mean=0.2, sigma=0.5, size=1000)
y = np.log(y)
>>> np.mean(z), np.std(z)
(1.3886515119063163, 0.7414709414626542)
>>> np.mean(y), np.std(y)
(0.2018986489272414, 0.5034218384643446)
All that said, if you really want what you asked for, you could do:
shape = 3, 3
zmin, zmax = 0, 0.5
n = np.prod(shape)
zc = np.array([])
while True:
z = np.random.lognormal(mean=0.2, sigma=0.5, size=n * 100)
z = z[(zmin <= z) & (z < zmax)]
zc = np.r_[zc, z]
if len(zc) >= n:
break
r = zc[:n].reshape(shape)
>>> r
array([[0.34078793, 0.45366409, 0.40183988],
[0.43387773, 0.46387512, 0.30535007],
[0.44248787, 0.32316698, 0.48600577]])
Note, however, that the loop above (which in this case is done on average just once), may run forever if your bounds specify an empty space.
Edit
For efficiency, we can use instead scipy.stats.truncnorm. However, there seems to be an issue when loc, scale != 0, 1
, so here we roll out our own adjustment:
# convert X ~ N(0, 1) to Y ~ N(u, s):
# Y = X * s + u
# want Y in [a, b]:
# so X in [(a - u)/s, (b - u)/s]
#
# We want Z ~ logN(u, s) in [zmin, zmax]
# Z = exp(Y) and
# a, b = np.log((zmin, zmax))
def lognormal(a, b, loc=1, scale=1, size=1):
a, b = np.log((a, b))
ca, cb = (np.array((a, b)) - loc) / scale
rv = truncnorm(a=ca, b=cb)
z = np.exp(rv.rvs(size=size) * scale + loc)
return z
Example:
zmin, zmax = 0, 0.5
z = lognormal(zmin, zmax, 0.2, 0.5, 10_000)
Or, to capture the rather noisy warning from np.log(0)
:
import warnings
with warnings.catch_warnings():
warnings.filterwarnings('ignore', category=RuntimeWarning, message='divide by zero encountered in log')
z = lognormal(zmin, zmax, 0.2, 0.5, 10_000)
Either way, we now have:
np.min(z), np.max(z)
(0.10787873738453256, 0.4999993445360746)
And the same distribution than one would get with the above (r
) with shape = 10_000
:
def lognormal_naive(a, b, loc=1, scale=1, size=1):
zc = np.array([])
while True:
z = np.random.lognormal(mean=loc, sigma=scale, size=n * 100)
z = z[(a <= z) & (z < b)]
zc = np.r_[zc, z]
if len(z) >= size:
break
return zc[:size]
r = lognormal_naive(zmin, zmax, 0.2, 0.5, 10_000)
fig, ax = plt.subplots(ncols=2, figsize=(7, 2))
ax[0].hist(z, bins=100)
ax[1].hist(r, bins=100);
However, the timing is much better:
%timeit lognormal_naive(zmin, zmax, 0.2, 0.5, 10_000)
64.2 ms ± 60.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit lognormal(zmin, zmax, 0.2, 0.5, 10_000)
2.06 ms ± 3.61 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 3