Reputation: 43
I want to generate a bounded distribution function from myself. But i see a strange behavior in the upper bound. This is my code:
import matplotlib.pyplot as plt
from scipy.stats import rv_continuous
def gaus(x, mu, sig):
return exp(-0.5*((x-mu)/sig)**2)/(sig*sqrt(2*pi))
class gaussian_gen(rv_continuous):
def _pdf(self, x):
return gaus(x,0.2,0.1)
gaussian = gaussian_gen(a=0.0,b=1)
plt.hist(gaussian.rvs(size=1000),bins=100)
plt.show()
And this is the plot: when it is centered at 0.2 I noticed that this behavior increases when the center of the gaussian is near the boundaries. What is the problem?
Upvotes: 2
Views: 421
Reputation: 726
The method _pdf()
requires a function that is properly normalized in the range of the probability density function. If the integral in the range [a, b]
is not 1
, scipy puts the remaining weight of the PDF at the upper bound. That is when when your mu
is near the bounds, you start seeing this feature at 1.
To correct this, we can modify the class to store the integral of the PDF between the bounds we care about and divide the return of _pdf()
by this value. The below code addresses this.
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import rv_continuous
from scipy.integrate import quad
def gaus(x, mu, sig):
return np.exp(-0.5*((x-mu)/sig)**2)/(sig*np.sqrt(2*np.pi))
class gaussian_gen(rv_continuous):
def __init__(self, mu, sig, *args, **kwargs):
super().__init__(*args, **kwargs)
self.mu = mu
self.sig = sig
# Perform integration in the range we care about
self.integral, _ = quad(gaus, self.a, self.b, args=(self.mu, self.sig))
def _pdf(self, x):
# Return the normalized pdf
return gaus(x,self.mu, self.sig) / self.integral
gaussian = gaussian_gen(0.2, 0.2, a=0., b=1.)
plt.hist(gaussian.rvs(size=1000),bins=100)
plt.show()
Which gives the following sampled distribution
Which behaves correctly around the edges.
Upvotes: 3