scipy.stat.norm.pdf does not add up to one

Question

I'm trying to compute the integral of a gaussian from an arbitrary point gamma as follows:

import numpy as np
from scipy.stats import norm

def func(mu, sigma, gamma):
    x = np.linspace(mu-4*sigma, mu + 4*sigma, 100)
    y = norm.pdf(x, mu, sigma)

    area = y[x>=gamma].sum()

    print(f"Area is ~ {area:.3f}")

    plt.plot(x,y, label=f'μ0 = {mu}, σ={sigma}')
    plt.fill_between(x[x>gamma],y[x>gamma], alpha=.5, label=f'area={area:.3f}')
    plt.title("Example")
    plt.legend(loc='best')
    plt.show()

# Execute the function
func(0,10,-20)

Output is: Area is ~ 1.211

The blue area is integrated but it adds more than one (far more), even when it's not the complete function.

I found this related but didn't help.

Why is it adding up more than one?

Rodrigo Laguna · Accepted Answer

As @Warren Weckesser says, the problem is that I'm not considering the dx term (the base size for the intergal)

Fortunately this is easy to fix, so for completness, here it goes...

import numpy as np
from scipy.stats import norm

def func(mu, sigma, gamma):
    # Notice the retstep=True param, it will return the dx needed
    x, dx = np.linspace(mu-4*sigma, mu + 4*sigma, 100, retstep=True)
    y = norm.pdf(x, mu, sigma)

    # multiply all with dx
    area = y[x>=gamma].sum() * dx

    print(f"Area is ~ {area:.3f}")

    plt.plot(x,y, label=f'μ0 = {mu}, σ={sigma}')
    plt.fill_between(x[x>gamma],y[x>gamma], alpha=.5, label=f'area={area:.3f}')
    plt.title("Example")
    plt.legend(loc='best')
    plt.show()

func(0,10,-20)

Now output makes sence, it's Area is ~ 0.978

and the image is

Thanks a lot @Warren Weckesser!!

scipy.stat.norm.pdf does not add up to one

Answers (1)

Related Questions