eSurfsnake
eSurfsnake

Reputation: 689

Am I doing something wrong when using Scipy.stats when implementing a lognormal function?

I may just be being brain dead, but I have code which defines a lognormal class for manipulation in Python using Scipy.stats.

My class looks like this:

class LogNorm :

def __init__(self, mean, sd, offset=0) :
# uses 'real' units -i.e., 100000, 15000 for mean and sdn
    self.mean = mean
    self.sd = sd
    self.offset = offset
    self.xvar_mu, self.xvar_sigma = get_base_mu_and_sigma(mean, sd)
    self.mean = np.exp(self.xvar_mu + self.xvar_sigma**2.0/2.0)   # reflect that change in the Y 
    self.sd = ((np.exp(self.xvar_sigma**2.0) - 1.0) *
               (np.exp(2.0 * self.xvar_mu + self.xvar_sigma**2.0))) ** 0.5
    self.RV = lognorm(s = self.xvar_sigma, scale = self.mean, loc = self.offset)  # fozen

The idea here is you pass in the mean and sd, as measured, of the lognormal itself. I record those for posterity (assume offset = 0.0, as per the default). Then I have a helper function which maps those into the mu and sigma of the normal distribution that underlies the lognormal. That function looks like so, if useful:

def get_base_mu_and_sigma(mean, sd) :
    mu = math.log(mean**2.0 / (sd**2.0 + mean**2.0)**0.5)
    sigma = (math.log(1.0 + sd**2.0/mean**2.0))**0.5
    return (mu, sigma)

This comes straight from Wikipedia, and seems right (check out the end of the 'Arithmetic Moments' section): https://en.wikipedia.org/wiki/Log-normal_distribution

Then, 'self.RV' becomes a 'frozen' RV, and has a bunch of builtin/inherited functions (mean(), median(), var(), etc) related to the lognormal described by mu and sigma.

The challenge I have is that when I create such an object, and then try examining the mean and sd (via square root of the variance), the numbers don't seem to match. For example, using mean = 110517.09 and sd = 2210.34 (from my application), when I then execute the following code I get inconsistent answers:

        p = rv1.RV.pdf(x)
        print("rv1.mean, rv1.sd = " + str(rv1.mean) + "  " + str(rv1.sd))
        print("rv1.mean(), rv1.var()**0.5 = " + str(rv1.RV.mean()) + "  " + str(rv1.RV.var()**0.5))

gives:

rv1.mean, rv1.sd = 110517.09180756475  2210.341836151173
rv1.mean(), rv1.var()**0.5 = 110539.19301602637  2210.783860320406

Any clue what I am doing wrong?

Upvotes: 1

Views: 233

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114976

You used self.mean for the scale argument of scipy.stats.lognorm, but that is not correct. Quoting from the docstring for lognorm:

A common parametrization for a lognormal random variable Y is in terms of the mean, mu, and standard deviation, sigma, of the unique normally distributed random variable X such that exp(X) = Y. This parametrization corresponds to setting s = sigma and scale = exp(mu).

So when you create self.RV by calling lognorm, the scale argument must be np.exp(self.xvar_mu). (This assumes offset is 0.)

Here's a simplified example that uses your function get_base_mu_and_sigma to convert the parameters.

First, here is your function, and the sample values that you used:

In [154]: def get_base_mu_and_sigma(mean, sd) :
     ...:     mu = math.log(mean**2.0 / (sd**2.0 + mean**2.0)**0.5)
     ...:     sigma = (math.log(1.0 + sd**2.0/mean**2.0))**0.5
     ...:     return (mu, sigma)
     ...: 

In [155]: mean = 110517.09180756475

In [156]: sd = 2210.341836151173

Get the parameters of the underlying normal distribution:

In [157]: mu, sigma = get_base_mu_and_sigma(mean, sd)

Create an instance of scipy's lognorm distribution, and verify that the mean and standard deviation of that distribution match mean and sd:

In [158]: ln = lognorm(s=sigma, loc=0, scale=np.exp(mu))

In [159]: ln.mean()
Out[159]: 110517.09180756476

In [160]: ln.std()
Out[160]: 2210.341836151174

Upvotes: 2

Related Questions