0range
0range

Reputation: 2158

How to obtain a python scipy-type continuous rv distribution object that is bounded?

I would like to define a bounded version of a continuous random variable distribution (say, an exponential, but I might want to use others as well). The bounds are 0 and 1. I would like to

  1. draw random variates (as done by scipy.stats.rv_continuous.rvs),
  2. use the ppf (percentage point function) (as done by scipy.stats.rv_continuous.ppf), and possibly
  3. use the cdf (cumulative density function) (as done by scipy.stats.rv_continuous.cdf)

Possible approaches I can think of:

My feeling is that there might be a better and more generic way to do this. Is there? Maybe with sympy? Maybe by somehow obtaining the function object of the unbounded cdf and modifying it directly?

Python is version: 3.6.2, scipy is version 0.19.1.

Upvotes: 2

Views: 1687

Answers (1)

Bill Bell
Bill Bell

Reputation: 21663

If the distribution is one of those that is available in scipy.stats then you can evaluate its integral between the two bounds using the cdf for that distribution. Otherwise, you can define the pdf for rv_continuous and then use its cdf to get this integral.

Now, you have, in effect, the pdf for the bounded version of the pdf you want because you have calculated the normalising constant for it, in that integral. You can proceed to use rv_continuous with the form that you have for the pdf plus the normalising constant and with the bounds.


Here's what your code might be like. The variable scale is set according to the scipy documents. norm is the integral of the exponential pdf over [0,1]. Only about .49 of the probability mass is accounted for. Therefore, to make the exponential, when truncated to the [0,1] interval give a mass of one we must divide its pdf by this factor.

Truncated_expon is defined as a subclass of rv_continuous as in the documentation. By supplying its pdf we make it possible (at least for such a simple integral!) for scipy to calculate this distribution's cdf and thereby to calculate random samples.

I have calculated the cdf at one as a check.

>>> from scipy import stats
>>> lamda = 2/3
>>> scale = 1/lamda
>>> norm = stats.expon.cdf(1, scale=scale)
>>> norm
0.48658288096740798
>>> from math import exp
>>> class Truncated_expon(stats.rv_continuous):
...     def _pdf(self, x, lamda):
...         return lamda*exp(-lamda*x)/0.48658288096740798
... 
>>> e = Truncated_expon(a=0, b=1, shapes='lamda')
>>> e.cdf(1, lamda=lamda)
1.0
>>> e.rvs(size=20, lamda=lamda)
array([ 0.20064067,  0.67646465,  0.89118679,  0.86093035,  0.14334989,
        0.10505598,  0.53488779,  0.11606106,  0.41296616,  0.33650899,
        0.95126415,  0.57481087,  0.04495104,  0.00308469,  0.23585195,
        0.00653972,  0.59400395,  0.34919065,  0.91762547,  0.40098409])

Upvotes: 1

Related Questions