juan
juan

Reputation: 21

Log normal distribution

I have a question regarding lognormal distribution. I want to create and ensemble of objects with "masses" from 10 to 10**5 that are normally distributed. I thought this would be a a lognormal distribution and so I started trying to do this in python like so:

mu, sigma = 3., 1. # mean and standard deviation
s = np.random.lognormal(mu, sigma, 1000)
count, bins, ignored = plt.hist(s, 1000, density=True, align='mid')
x = np.linspace(min(bins), max(bins), 1000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)) / (x * sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, linewidth=2, color='r')
plt.xscale('log')
plt.show()

as is shown in the example from numpy, but changing mu and sigma and looking at the plots, I can't really tell if setting m and v (following the wikipedia article linked below) to 10**5 and 1000 lets say gives me what I want

I looked at https://en.wikipedia.org/wiki/Log-normal_distribution to figure out how to calculate mu and sigma, but maybe im doing something else wrong. Is this the right approach to this problem?

I read previous questions/answers regarding the lognormal distribution, but i don't think they asked the same thing. Sorry in advanced if this type of question has already been answered.

mu, sigma = 3., 1. Is what's given in the example, This works fine, but when i change mu and sigma to values like:

m=10**3.5 #where I want the distribution to be centered
v=10000   #the "spread" that I want 
f=1.+(v/m2)
mu=np.log(m/np.sqrt(f))
sigma=np.sqrt(np.log(f))

I don't get what i expected.. which is a distribution centered around 10**3.5 with a std of 10000.

Trying what was suggested:

mu=np.log(3000)
sigma=np.log(10)
s = np.random.lognormal(mu, sigma, 1000)
count, bins, ignored = plt.hist(s, 500, density=True, align='mid')
x = np.linspace(min(bins), max(bins), 1000)
pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)) / (x * sigma * np.sqrt(2 * np.pi)))
plt.semilogx(x, pdf, linewidth=2, color='r')

This doesn't seem to work either, unless im misinterpreting the histogram histogram

Upvotes: 1

Views: 2033

Answers (2)

Bill M.
Bill M.

Reputation: 1548

If you know that you want 1000 values that are log-normally distributed distribution (i.e., log(x) gives you normal distribution), and you want your data to range from 10 to 10^5, then you have to do some calculations to get mu and sigma. But the values you have to plug into np.random.lognormal are the mean and standard deviation of the underlying, related normal distribution, not the log-normal distribution. You can derive these from the mean and variance formulas given on the Wikipedia page you saw.

# Parameters
xmax = 10**5
xmin = 10
n = 1000

# Get original mean and variance
# mu: We want normal distribution, so just take the average of the extremes.
# sigma: use the z = (x - mu)/sigma formula and approximation that 
#        the extremes are a deviation of z=3 away.
mu = (xmax + xmin)/2.0
sigma = (xmax - mu)/3.0
m = mu
v = sigma**2

# Get the mean and standard deviation of the underlying normal distribution
norm_mu = np.log(m**2 / np.sqrt(v + m**2))
norm_sigma = np.sqrt((v / m**2)+1)

# Generate random data and an overlying smooth curve
# (This is the same as your code, except I replaced the parameters
# in the 'pdf =' formula.)
s = np.random.lognormal(norm_mu, norm_sigma, n)
count, bins, ignored = plt.hist(s, n, density=True, align='mid')
x = np.linspace(min(bins), max(bins), n)
pdf = (np.exp(-(np.log(x) - norm_mu)**2 / (2 * norm_sigma**2)) / (x * norm_sigma * np.sqrt(2 * np.pi)))
plt.plot(x, pdf, linewidth=2, color='r')
plt.xscale('log')
plt.show()

Here's what I get. Notice the scaling on the x-axis goes up exponentially, not linearly. Is this the sort of thing you're looking for?

enter image description here

Upvotes: 0

wltrimbl
wltrimbl

Reputation: 116

I think you are having difficulty interpreting the parameters of the distribution.

The documentation for np.random.lognormal is here: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.lognormal.html

In particular, the mean is not mu or 10**mu, but exp(mu), so your distribution as given has a mean of e**3 ≈ 20.

You seem to want the mean to be about 1000, so setting mu and sigma to

mu, sigma  = np.log(1000), np.log(10)`

will generate the distribution that you were expecting.

Upvotes: 0

Related Questions