Alberto Mercurio
Alberto Mercurio

Reputation: 43

Upper bound problem using scipy.stats.rv_continuous

I want to generate a bounded distribution function from myself. But i see a strange behavior in the upper bound. This is my code:

import matplotlib.pyplot as plt
from scipy.stats import rv_continuous

def gaus(x, mu, sig):
    return exp(-0.5*((x-mu)/sig)**2)/(sig*sqrt(2*pi))

class gaussian_gen(rv_continuous):
    def _pdf(self, x):
        return gaus(x,0.2,0.1)

gaussian = gaussian_gen(a=0.0,b=1)
plt.hist(gaussian.rvs(size=1000),bins=100)
plt.show()

And this is the plot: when it is centered at 0.2 I noticed that this behavior increases when the center of the gaussian is near the boundaries. What is the problem?

Upvotes: 2

Views: 421

Answers (1)

alexpiers
alexpiers

Reputation: 726

The method _pdf() requires a function that is properly normalized in the range of the probability density function. If the integral in the range [a, b] is not 1, scipy puts the remaining weight of the PDF at the upper bound. That is when when your mu is near the bounds, you start seeing this feature at 1.

To correct this, we can modify the class to store the integral of the PDF between the bounds we care about and divide the return of _pdf() by this value. The below code addresses this.

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import rv_continuous
from scipy.integrate import quad


def gaus(x, mu, sig):
    return np.exp(-0.5*((x-mu)/sig)**2)/(sig*np.sqrt(2*np.pi))

class gaussian_gen(rv_continuous):
    def __init__(self, mu, sig, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.mu = mu
        self.sig = sig

        # Perform integration in the range we care about
        self.integral, _ = quad(gaus, self.a, self.b, args=(self.mu, self.sig))

    def _pdf(self, x):
        # Return the normalized pdf
        return gaus(x,self.mu, self.sig) / self.integral

gaussian = gaussian_gen(0.2, 0.2, a=0., b=1.)
plt.hist(gaussian.rvs(size=1000),bins=100)
plt.show()

Which gives the following sampled distribution Correctly sampled from the PDF

Which behaves correctly around the edges.

Upvotes: 3

Related Questions