Reputation: 2158
I would like to define a bounded version of a continuous random variable distribution (say, an exponential, but I might want to use others as well). The bounds are 0 and 1. I would like to
scipy.stats.rv_continuous.rvs
), scipy.stats.rv_continuous.ppf
), and possiblyscipy.stats.rv_continuous.cdf
)Possible approaches I can think of:
Getting random variates in an ad hoc way is not difficult
import scipy.stats
d = scipy.stats.expon(0, 3/10.) # an exponential distribution as an example
rv = d.rvs(size=target_number_of_rv)
rv = rv[0=<rv]
rv = rv[rv<=1]
while len(rv) < target_number_of_rv:
rv += d.rvs(1)
rv = rv[0=<rv]
rv = rv[rv<=1]
but 1) this is non-generic and potentially error-prone and 2) it does not help with the ppf or cdf.
Subclassing scipy.stats.rv_continuous, as is done here and here. Thereby, the ppf of scipy.stats.rv_continuous can be used. The drawback is that it requires the pdf (not just a pre-defined rv_continuous object or the pdf of the unbounded distribution and the bounds), and if this is wrong, cdf and ppf and everything else will be wrong as well.
Designing a class that cares for applying the bounds to the rv generation and for correcting the value of the ppf obtained from the unbounded object in scipy.stats. A drawback is that this is non-generic and error-prone as well and that it may be difficult to correct the ppf. My feeling is that the value of the cdf of the unbounded distribution could be scaled by what share of probability mass is out of the bounds (in total, lower and upper), but I may be wrong. That would be for lower and upper bounds l and u and any valid quantile x (with l<=x<=u): (cdf(x)-cdf(l))/(cdf(u)-cdf(l)). Obtaining the ppf would, however, require to invert the resulting function.
My feeling is that there might be a better and more generic way to do this. Is there? Maybe with sympy? Maybe by somehow obtaining the function object of the unbounded cdf and modifying it directly?
Python is version: 3.6.2, scipy is version 0.19.1.
Upvotes: 2
Views: 1687
Reputation: 21663
If the distribution is one of those that is available in scipy.stats
then you can evaluate its integral between the two bounds using the cdf for that distribution. Otherwise, you can define the pdf for rv_continuous
and then use its cdf to get this integral.
Now, you have, in effect, the pdf for the bounded version of the pdf you want because you have calculated the normalising constant for it, in that integral. You can proceed to use rv_continuous
with the form that you have for the pdf plus the normalising constant and with the bounds.
Here's what your code might be like. The variable scale
is set according to the scipy documents. norm
is the integral of the exponential pdf over [0,1]. Only about .49 of the probability mass is accounted for. Therefore, to make the exponential, when truncated to the [0,1] interval give a mass of one we must divide its pdf by this factor.
Truncated_expon
is defined as a subclass of rv_continuous
as in the documentation. By supplying its pdf we make it possible (at least for such a simple integral!) for scipy to calculate this distribution's cdf and thereby to calculate random samples.
I have calculated the cdf at one as a check.
>>> from scipy import stats
>>> lamda = 2/3
>>> scale = 1/lamda
>>> norm = stats.expon.cdf(1, scale=scale)
>>> norm
0.48658288096740798
>>> from math import exp
>>> class Truncated_expon(stats.rv_continuous):
... def _pdf(self, x, lamda):
... return lamda*exp(-lamda*x)/0.48658288096740798
...
>>> e = Truncated_expon(a=0, b=1, shapes='lamda')
>>> e.cdf(1, lamda=lamda)
1.0
>>> e.rvs(size=20, lamda=lamda)
array([ 0.20064067, 0.67646465, 0.89118679, 0.86093035, 0.14334989,
0.10505598, 0.53488779, 0.11606106, 0.41296616, 0.33650899,
0.95126415, 0.57481087, 0.04495104, 0.00308469, 0.23585195,
0.00653972, 0.59400395, 0.34919065, 0.91762547, 0.40098409])
Upvotes: 1