Reputation: 2058
Following this post, I tried to create a logit-normal distribution by creating the LogitNormal
class:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import logit
from scipy.stats import norm, rv_continuous
class LogitNormal(rv_continuous):
def _pdf(self, x, **kwargs):
return norm.pdf(logit(x), **kwargs)/(x*(1-x))
class OtherLogitNormal:
def pdf(self, x, **kwargs):
return norm.pdf(logit(x), **kwargs)/(x*(1-x))
fig, ax = plt.subplots()
values = np.linspace(10e-10, 1-10e-10, 1000)
sigma, mu = 1.78, 0
ax.plot(
values, LogitNormal().pdf(values, loc=mu, scale=sigma), label='subclassed'
)
ax.plot(
values, OtherLogitNormal().pdf(values, loc=mu, scale=sigma),
label='not subclassed'
)
ax.legend()
fig.show()
However, the LogitNormal
class does not produce the desired results. When I don't subclass rv_continuous
it works. Why is that? I need the subclassing to work because I also need the other methods that come with it like rvs
.
Btw, the only reason I am creating my own logit-normal distribution in Python is because the only implementations of that distribution that I could find were from the PyMC3 package and from the TensorFlow package, both of which are pretty heavy / overkill if you only need them for that one function. I already tried PyMC3
, but apparently it doesn't do well with scipy
I think, it always crashed for me. But that's a whole different story.
Upvotes: 6
Views: 2224
Reputation: 11002
I came across this problem this week and the only relevant issue I have found about it is this post. I have almost same requirement as the OP:
But I also need:
scipy
random variable interface.As @Jacques Gaudin
pointed out the interface for rv_continous
(see distribution architecture for details) does not ensure follow up for loc
and scale
parameters when inheriting from this class. And this is somehow misleading and unfortunate.
Implementing the __init__
method of course allow to create the missing binding but the trade off is: it breaks the pattern scipy
is currently using to implement random variables (see an example of implementation for lognormal).
So, I took time to dig into the scipy
code and I have created a MCVE for this distribution. Although it is not totally complete (it mainly misses moments overrides) it fits the bill for both OP and my purposes while having satisfying accuracy and performance.
An interface compliant implementation of this random variable could be:
class logitnorm_gen(stats.rv_continuous):
def _argcheck(self, m, s):
return (s > 0.) & (m > -np.inf)
def _pdf(self, x, m, s):
return stats.norm(loc=m, scale=s).pdf(special.logit(x))/(x*(1-x))
def _cdf(self, x, m, s):
return stats.norm(loc=m, scale=s).cdf(special.logit(x))
def _rvs(self, m, s, size=None, random_state=None):
return special.expit(m + s*random_state.standard_normal(size))
def fit(self, data, **kwargs):
return stats.norm.fit(special.logit(data), **kwargs)
logitnorm = logitnorm_gen(a=0.0, b=1.0, name="logitnorm")
This implementation unlock most of the scipy
random variables potential.
N = 1000
law = logitnorm(0.24, 1.31) # Defining a RV
sample = law.rvs(size=N) # Sampling from RV
params = logitnorm.fit(sample) # Infer parameters w/ MLE
check = stats.kstest(sample, law.cdf) # Hypothesis testing
bins = np.arange(0.0, 1.1, 0.1) # Bin boundaries
expected = np.diff(law.cdf(bins)) # Expected bin counts
As it relies on scipy
normal distribution we may assume underlying functions have the same accuracy and performance than normal random variable object. But it might indeed be subject to float arithmetic inaccuracy especially when dealing with highly skewed distributions at the support boundary.
To check out how it performs we draw some distribution of interest and check them. Let's create some fixtures:
def generate_fixtures(
locs=[-2.0, -1.0, 0.0, 0.5, 1.0, 2.0],
scales=[0.32, 0.56, 1.00, 1.78, 3.16],
sizes=[100, 1000, 10000],
seeds=[789, 123456, 999999]
):
for (loc, scale, size, seed) in itertools.product(locs, scales, sizes, seeds):
yield {"parameters": {"loc": loc, "scale": scale}, "size": size, "random_state": seed}
And perform checks on related distributions and samples:
eps = 1e-8
x = np.linspace(0. + eps, 1. - eps, 10000)
for fixture in generate_fixtures():
# Reference:
parameters = fixture.pop("parameters")
normal = stats.norm(**parameters)
sample = special.expit(normal.rvs(**fixture))
# Logit Normal Law:
law = logitnorm(m=parameters["loc"], s=parameters["scale"])
check = law.rvs(**fixture)
# Fit:
p = logitnorm.fit(sample)
trial = logitnorm(*p)
resample = trial.rvs(**fixture)
# Hypothetis Tests:
ks = stats.kstest(check, trial.cdf)
bins = np.histogram(resample)[1]
obs = np.diff(trial.cdf(bins))*fixture["size"]
ref = np.diff(law.cdf(bins))*fixture["size"]
chi2 = stats.chisquare(obs, ref, ddof=2)
Some adjustments with n=1000, seed=789
(this sample is quite normal) are shown below:
Upvotes: 6
Reputation: 16958
If you look at the source code of the pdf
method, you will notice that _pdf
is called without the scale
and loc
keyword arguments.
if np.any(cond): goodargs = argsreduce(cond, *((x,)+args+(scale,))) scale, goodargs = goodargs[-1], goodargs[:-1] place(output, cond, self._pdf(*goodargs) / scale)
It results that the kwargs
in your overriding _pdf
method is always an empty dictionary.
If you look a bit closer at the code, you will also notice that the scaling and location are handled by pdf
as opposed to _pdf
.
In your case, the _pdf
method calls norm.pdf
so the loc
and scale
parameters must somehow be available in LogitNormal._pdf
.
You could for example pass scale
and loc
when creating an instance of LogitNormal
and store the values as class attributes:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import logit
from scipy.stats import norm, rv_continuous
class LogitNormal(rv_continuous):
def __init__(self, scale=1, loc=0):
super().__init__(self)
self.scale = scale
self.loc = loc
def _pdf(self, x):
return norm.pdf(logit(x), loc=self.loc, scale=self.scale)/(x*(1-x))
fig, ax = plt.subplots()
values = np.linspace(10e-10, 1-10e-10, 1000)
sigma, mu = 1.78, 0
ax.plot(
values, LogitNormal(scale=sigma, loc=mu).pdf(values), label='subclassed'
)
ax.legend()
fig.show()
Upvotes: 3