tmo
tmo

Reputation: 1553

Fitting and Plotting Lognormal

I'm having trouble doing something as relatively simple as:

  1. Draw N samples from a gaussian with some mean and variance
  2. Take logs to those N samples
  3. Fit a lognormal (using stats.lognorm.fit)
  4. Spit out a nice and smooth lognormal pdf without inf values (using stats.lognorm.pdf)

Here's a small working example of the output I'm getting:

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import math

%matplotlib inline


def lognormDrive(mu,variance):
    size = 1000
    sigma = math.sqrt(variance)
    np.random.seed(1)
    gaussianData = stats.norm.rvs(loc=mu, scale=sigma, size=size)
    logData = np.exp(gaussianData)
    shape, loc, scale = stats.lognorm.fit(logData, floc=mu)
    return stats.lognorm.pdf(logData, shape, loc, scale)

plt.plot(lognormDrive(37,0.8))

enter image description here

And as you might notice, the plot makes absolutely no sense.

Any ideas?

I've followed these posts: POST1 POST2

Thanks in advance!

Elaboration: I am building a small script that will

  1. Take raw data and fit a kernel distribution (emperical dist.)
  2. Assume different distributions given the mean and variance of the data. This would be a gaussian and a lognormal
  3. Plot those distributions together with the emperical dist using interact
  4. Calculate the Kullbeck-Leibler divergence between the different distributions when one turns the knob for the mean and variance (and skew eventually)

Upvotes: 0

Views: 7134

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114976

In the call to lognorm.fit(), use floc=0, not floc=mu.

(The location parameter of the lognorm distribution simply translates the distribution. You almost never want to do that with the log-normal distribution.)

See A lognormal distribution in python

By the way, you are plotting the PDF of the unsorted sample values, so the plot in the corrected script won't look much different. You might find it more useful to plot the PDF against the sorted values. Here's a modification of your script that creates a plot of the PDF using the sorted samples:

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import math


def lognormDrive(mu,variance):
    size = 1000
    sigma = math.sqrt(variance)
    np.random.seed(1)
    gaussianData = stats.norm.rvs(loc=mu, scale=sigma, size=size)
    logData = np.exp(gaussianData)
    shape, loc, scale = stats.lognorm.fit(logData, floc=0)
    print "Estimated mu:", np.log(scale)
    print "Estimated var: ", shape**2
    logData.sort()
    return logData, stats.lognorm.pdf(logData, shape, loc, scale)

x, y = lognormDrive(37, 0.8)
plt.plot(x, y)
plt.grid()
plt.show()

The script prints:

Estimated mu: 37.0347152587
Estimated var:  0.769897988163

and creates the following plot:

plot

Upvotes: 2

Related Questions