Raaj
Raaj

Reputation: 403

How do I calculate PDF (probability density function) in Python?

I have the following code below that prints the PDF graph for a particular mean and standard deviation.

Now I need to find the actual probability, of a particular value. So for example if my mean is 0, and my value is 0, my probability is 1. This is usually done by calculating the area under the curve. Similar to this:

http://homepage.divms.uiowa.edu/~mbognar/applets/normal.html

I am not sure how to approach this problem

import numpy as np
import matplotlib    
import matplotlib.pyplot as plt
    
def normal(power, mean, std, val):
    a = 1/(np.sqrt(2*np.pi)*std)
    diff = np.abs(np.power(val-mean, power))
    b = np.exp(-(diff)/(2*std*std))
    return a*b

pdf_array = []
array = np.arange(-2,2,0.1)
print array
for i in array:
    print i
    pdf = normal(2, 0, 0.1, i)
    print pdf
    pdf_array.append(pdf)

plt.plot(array, pdf_array)
plt.ylabel('some numbers')
plt.axis([-2, 2, 0, 5])
plt.show()

print 

Upvotes: 20

Views: 109484

Answers (3)

Ana
Ana

Reputation: 165

If you want to write it from scratch:

class PDF():
    def __init__(self,mu=0, sigma=1):
        self.mean = mu
        self.stdev = sigma
        self.data = []

    def calculate_mean(self):
        self.mean = sum(self.data) // len(self.data)
        return self.mean

    def calculate_stdev(self,sample=True):
        if sample:
            n = len(self.data)-1
        else:
            n = len(self.data)
        mean = self.mean
        sigma = 0
        for el in self.data:
            sigma += (el - mean)**2
        sigma = math.sqrt(sigma / n)
        self.stdev = sigma
        return self.stdev

    def pdf(self, x):
        return (1.0 / (self.stdev * math.sqrt(2*math.pi))) * math.exp(-0.5*((x - self.mean) / self.stdev) ** 2)



Upvotes: 11

user7345804
user7345804

Reputation:

The area under a curve y = f(x) from x = a to x = b is the same as the integral of f(x)dx from x = a to x = b. Scipy has a quick easy way to do integrals. And just so you understand, the probability of finding a single point in that area cannot be one because the idea is that the total area under the curve is one (unless MAYBE it's a delta function). So you should get 0 ≤ probability of value < 1 for any particular value of interest. There may be different ways of doing it, but a conventional way is to assign confidence intervals along the x-axis like this. I would read up on Gaussian curves and normalization before continuing to code it.

Upvotes: 5

martinako
martinako

Reputation: 2764

Unless you have a reason to implement this yourself. All these functions are available in scipy.stats.norm

I think you asking for the cdf, then use this code:

from scipy.stats import norm
print(norm.cdf(x, mean, std))

Upvotes: 17

Related Questions