Rodrigo Laguna
Rodrigo Laguna

Reputation: 1850

scipy.stat.norm.pdf does not add up to one

I'm trying to compute the integral of a gaussian from an arbitrary point gamma as follows:

import numpy as np
from scipy.stats import norm

def func(mu, sigma, gamma):
    x = np.linspace(mu-4*sigma, mu + 4*sigma, 100)
    y = norm.pdf(x, mu, sigma)

    area = y[x>=gamma].sum()

    print(f"Area is ~ {area:.3f}")

    plt.plot(x,y, label=f'μ0 = {mu}, σ={sigma}')
    plt.fill_between(x[x>gamma],y[x>gamma], alpha=.5, label=f'area={area:.3f}')
    plt.title("Example")
    plt.legend(loc='best')
    plt.show()

# Execute the function
func(0,10,-20)

Output is: Area is ~ 1.211

with this image

The blue area is integrated but it adds more than one (far more), even when it's not the complete function.

I found this related but didn't help.

Why is it adding up more than one?

Upvotes: 1

Views: 891

Answers (1)

Rodrigo Laguna
Rodrigo Laguna

Reputation: 1850

As @Warren Weckesser says, the problem is that I'm not considering the dx term (the base size for the intergal)

Fortunately this is easy to fix, so for completness, here it goes...

import numpy as np
from scipy.stats import norm

def func(mu, sigma, gamma):
    # Notice the retstep=True param, it will return the dx needed
    x, dx = np.linspace(mu-4*sigma, mu + 4*sigma, 100, retstep=True)
    y = norm.pdf(x, mu, sigma)

    # multiply all with dx
    area = y[x>=gamma].sum() * dx

    print(f"Area is ~ {area:.3f}")

    plt.plot(x,y, label=f'μ0 = {mu}, σ={sigma}')
    plt.fill_between(x[x>gamma],y[x>gamma], alpha=.5, label=f'area={area:.3f}')
    plt.title("Example")
    plt.legend(loc='best')
    plt.show()

func(0,10,-20)

Now output makes sence, it's Area is ~ 0.978

and the image is

corrected graph

Thanks a lot @Warren Weckesser!!

Upvotes: 1

Related Questions