Reputation: 5201
I am trying to calculate E[f(x)]
for some pdf that I generate/estimated from data.
It says in the documentation:
Subclassing
New random variables can be defined by subclassing rv_continuous class and re-defining at least the _pdf or the _cdf method (normalized to location 0 and scale 1) which will be given clean arguments (in between a and b) and passing the argument check method.
If positive argument checking is not correct for your RV then you will also need to re-define the _argcheck method.
So I subclassed and defined _pdf but whenever I try call:
print my_continuous_rv.expect(lambda x: x)
scipy yells at me:
AttributeError: 'your_continuous_rv' object has no attribute 'a'
Which makes sense because I guess its trying to figure out the lower bound of the integral because it also print in the error:
lb = loc + self.a * scale
I tried defining the attribute self.a and self.b as (which I believe are the limits/interval of where the rv is defined):
self.a = float("-inf")
self.b = float("inf")
However, when I do that then it complains and says:
if N > self.numargs:
AttributeError: 'your_continuous_rv' object has no attribute 'numargs'
I was not really sure what numargs was suppose to be but after checking scipy's code on github it looks there is this line of code:
if not hasattr(self, 'numargs'):
# allows more general subclassing with *args
self.numargs = len(shapes)
Which I assume is the shape of the random variable my function was suppose to take.
Currently I am only doing a very simple random variable with a single float as a possible value for it. So I decided to hard code numargs to be 1. But that just lead down the road to more yelling from scipy's part.
Thus, what it boils down is that I think from the documentation its not clear to me what I have to do when I subclass it, because I did what they said, to overwrite _pdf but after doing that it asks me for self.a, which I hardcoded and then it asks me for numargs, and at this point I think I am concluding I don't really know how they want me to subclass, rv_continuous. Does some one know? I have can generate the pdf I want from the data I want to fit and then just be able to get expected values and things like that from the pdf, what else do I have to initialize in rv_continous so that it actually works?
Upvotes: 3
Views: 3305
Reputation: 1516
My guess would be, you have specified a constructor with __init__
, but did not call the base class constructor of rv_continuous
.
If you take the example from the documentation and add a constructor like this
from scipy.stats import rv_continuous
class gaussian_gen(rv_continuous):
"Gaussian distribution"
def __init__(self, **kwargs):
pass
#rv_continuous.__init__(self, **kwargs)
def _pdf(self, x):
return np.exp(-x**2 / 2.) / np.sqrt(2.0 * np.pi)
gaussian = gaussian_gen(name='gaussian')
print(gaussian.pdf(3))
the call to pdf
will fail. If you comment out the call to rv_continous.__init__
, it'll work, because then the base class constructor will add all necessary fields to the instance.
Upvotes: 1
Reputation: 2094
SciPy's rv_histogram method allows you to provide data and it provides the pdf, cdf and random generation methods.
Upvotes: 1
Reputation: 26030
For historical reasons, scipy distributions are instances, so that you need to have an instance of your subclass. For example:
>>> class MyRV(stats.rv_continuous):
... def _pdf(self, x, k):
... return k * np.exp(-k*x)
>>> my_rv = MyRV(name='exp', a=0.) # instantiation
Notice the need to specify the limits of the support: default values are a=-inf
and b=inf
.
>>> my_rv.a, my_rv.b
(0.0, inf)
>>> my_rv.numargs # gets figured out automagically
1
Once you've specified, say, _pdf
, you have a working distribution instance:
>>> my_rv.cdf(4, k=3)
0.99999385578764677
>>> my_rv.rvs(k=3, size=4)
array([ 0.37696127, 1.10192779, 0.02632473, 0.25516446])
>>> my_rv.expect(lambda x: 1, args=(2,)) # k=2 here
0.9999999999999999
Upvotes: 4