Charlie Parker
Charlie Parker

Reputation: 5201

Using the methods of scipy's rv_continuous when creating a cutom continuous distribution

I am trying to calculate E[f(x)] for some pdf that I generate/estimated from data.

It says in the documentation:

Subclassing

New random variables can be defined by subclassing rv_continuous class and re-defining at least the _pdf or the _cdf method (normalized to location 0 and scale 1) which will be given clean arguments (in between a and b) and passing the argument check method.

If positive argument checking is not correct for your RV then you will also need to re-define the _argcheck method.

So I subclassed and defined _pdf but whenever I try call:

print my_continuous_rv.expect(lambda x: x)

scipy yells at me:

AttributeError: 'your_continuous_rv' object has no attribute 'a'

Which makes sense because I guess its trying to figure out the lower bound of the integral because it also print in the error:

lb = loc + self.a * scale

I tried defining the attribute self.a and self.b as (which I believe are the limits/interval of where the rv is defined):

self.a = float("-inf")
self.b = float("inf")

However, when I do that then it complains and says:

if N > self.numargs:
AttributeError: 'your_continuous_rv' object has no attribute 'numargs'

I was not really sure what numargs was suppose to be but after checking scipy's code on github it looks there is this line of code:

if not hasattr(self, 'numargs'):
    # allows more general subclassing with *args
    self.numargs = len(shapes)

Which I assume is the shape of the random variable my function was suppose to take.

Currently I am only doing a very simple random variable with a single float as a possible value for it. So I decided to hard code numargs to be 1. But that just lead down the road to more yelling from scipy's part.

Thus, what it boils down is that I think from the documentation its not clear to me what I have to do when I subclass it, because I did what they said, to overwrite _pdf but after doing that it asks me for self.a, which I hardcoded and then it asks me for numargs, and at this point I think I am concluding I don't really know how they want me to subclass, rv_continuous. Does some one know? I have can generate the pdf I want from the data I want to fit and then just be able to get expected values and things like that from the pdf, what else do I have to initialize in rv_continous so that it actually works?

Upvotes: 3

Views: 3305

Answers (3)

Elmar Zander
Elmar Zander

Reputation: 1516

My guess would be, you have specified a constructor with __init__, but did not call the base class constructor of rv_continuous.

If you take the example from the documentation and add a constructor like this

from scipy.stats import rv_continuous
class gaussian_gen(rv_continuous):
    "Gaussian distribution"
    def __init__(self, **kwargs):
        pass
        #rv_continuous.__init__(self, **kwargs)
    def _pdf(self, x):
        return np.exp(-x**2 / 2.) / np.sqrt(2.0 * np.pi)

gaussian = gaussian_gen(name='gaussian')
print(gaussian.pdf(3))

the call to pdf will fail. If you comment out the call to rv_continous.__init__, it'll work, because then the base class constructor will add all necessary fields to the instance.

Upvotes: 1

Lars Ericson
Lars Ericson

Reputation: 2094

SciPy's rv_histogram method allows you to provide data and it provides the pdf, cdf and random generation methods.

Upvotes: 1

ev-br
ev-br

Reputation: 26030

For historical reasons, scipy distributions are instances, so that you need to have an instance of your subclass. For example:

>>> class MyRV(stats.rv_continuous):
...    def _pdf(self, x, k):
...      return k * np.exp(-k*x)
>>> my_rv = MyRV(name='exp', a=0.)     # instantiation

Notice the need to specify the limits of the support: default values are a=-inf and b=inf.

>>> my_rv.a, my_rv.b
(0.0, inf)
>>> my_rv.numargs        # gets figured out automagically
1

Once you've specified, say, _pdf, you have a working distribution instance:

>>> my_rv.cdf(4, k=3)
0.99999385578764677
>>> my_rv.rvs(k=3, size=4)
array([ 0.37696127,  1.10192779,  0.02632473,  0.25516446])
>>> my_rv.expect(lambda x: 1, args=(2,))    # k=2 here
0.9999999999999999

Upvotes: 4

Related Questions