Why does scipy.stats.rv_continuous choose the upper bound far too many times?

Question

I include the code I've written below. For some reason the upper bound of 0.804 is overly sampled when compared to the initial distribution. This occurs for both the distributions I am using.

Is this a common problem for rv_continuous or am I missing something?

import matplotlib.pyplot as plt
import scipy.stats as st

class Disk_pdf(st.rv_continuous):
    def _pdf(self,x):
        return (x*(1-np.exp((x-0.804)/0.2539)))/((1+x)*(x**2+0.0256**2)**0.5)

Disk_cv = Disk_pdf(a=0,b=0.804,name='Disk_pdf')
Disk_dist = Disk_cv.rvs(size = 10000)
plt.figure()
plt.hist(Disk_dist,100)




class Bulge_pdf(st.rv_continuous):
    def _pdf(self,x):
        return x*np.exp(-2.368*x-6.691*x**2)
Bulge_cv = Bulge_pdf(a=0,b=0.804,name='Bulge_pdf')

Bulge_dist = Bulge_cv.rvs(size = 10000)
plt.figure()
plt.hist(Bulge_dist,100)

Images of the initial distributions and the histograms created using rv_continuous are available below. I have two images of the histograms, one zoomed in to show that the distribution is captured by the method other than the over sampled upper bound. The other image shows the histogram on a y scale that shows how bad the over sampling problem is.

Initial Disk galaxies' distribution and histograms made using rv_continuous which have over sampled upper bound.

Initial Bulge dominated galaxies' distribution and histograms made using rv_continuous which have over sampled upper bound.

ev-br · Accepted Answer

The pdf must be normalized, and yours does not seem to be:

In [6]: from scipy.integrate import quad

In [7]: quad(Disk_cv.pdf, 0, 0.804)
Out[7]: (0.41121809643549406, 4.005573481922018e-09)

Why does scipy.stats.rv_continuous choose the upper bound far too many times?

Answers (1)

Related Questions