Reputation: 20909
How to calculate probability in normal distribution given mean, std in Python? I can always explicitly code my own function according to the definition like the OP in this question did: Calculating Probability of a Random Variable in a Distribution in Python
Just wondering if there is a library function call will allow you to do this. In my imagine it would like this:
nd = NormalDistribution(mu=100, std=12)
p = nd.prob(98)
There is a similar question in Perl: How can I compute the probability at a point given a normal distribution in Perl?. But I didn't see one in Python.
Numpy
has a random.normal
function, but it's like sampling, not exactly what I want.
Upvotes: 131
Views: 330966
Reputation: 589
for discrete RV - pdf
tells us the probability (or likelihood) that RV takes on a certain value
from scipy.stats import norm
probability_pdf = norm.pdf(113, loc=mean, scale=std)
print(probability_pdf)
for continuous RV - pdf
is the rate of change for the cdf
- so we can find function f(x) by differetiation (taking derivative) of pdf & using given x
to get y
(val in [0;1]-range in CDF) OR just use cdf
-method for rv
-object in numpy
from_cdf= rv.cdf(113)
print("cdf: ", "{:.2f}%".format(from_cdf*100))
and check with cdf-inversion
print(norm(100,12).ppf(from_cdf))
As so as, cdf
returns the integral from -inf to x - we're getting the fixed percent (if *100) of probability, that RV will not exceed 113 in our certain distribution here in the example
P.S. we can also Calculate expected value of a function with respect to the distribution by numerical integration using expect
-method
print("expect: ", rv.expect(func=lambda x: 113, lb=50, ub=150))
print("expect: ", rv.expect(func=lambda x: 113, lb=-np.inf, ub=+np.inf))
# expectation E(X): 112.99650732890463
# expectation E(X): 113.00000000000003
# (as weighted average)
The density function basically is a function made such that by computing the integral of it between points [calc. cdf & make subtraction], (or in higher dimensions over a particular area/volume/...) you are given the probability that a continuous random variable falls within those points (or area or volume etc...)
Upvotes: 0
Reputation: 397
I would like to say: the questioner is asking "How to calculate the likelihood of a given data point in a normal distribution given mean & standard deviation?" instead of "How to calculate probability in a normal distribution given mean & standard deviation?".
For "probability", it must be between 0 and 1, but for "likelihood", it must be non-negative (not necessarily between 0 and 1).
You could use multivariate_normal.pdf(x, mean= mean_vec, cov=cov_matrix)
in scipy.stats.multivariate_normal to calculate it.
Upvotes: 2
Reputation: 7972
Note that probability is different than probability density pdf()
, which some of the previous answers refer to. Probability is the chance that the variable has a specific value, whereas the probability density is the chance that the variable will be near a specific value, meaning probability over a range. So to obtain the probability you need to compute the integral of the probability density function over a given interval. As an approximation, you can simply multiply the probability density by the interval you're interested in and that will give you the actual probability.
import numpy as np
from scipy.stats import norm
data_start = -10
data_end = 10
data_points = 21
data = np.linspace(data_start, data_end, data_points)
point_of_interest = 5
mu = np.mean(data)
sigma = np.std(data)
interval = (data_end - data_start) / (data_points - 1)
probability = norm.pdf(point_of_interest, loc=mu, scale=sigma) * interval
The code above will give you the probability that the variable will have an exact value of 5 in a normal distribution between -10 and 10 with 21 data points (meaning interval is 1). You can play around with a fixed interval value, depending on the results you want to achieve.
Upvotes: 6
Reputation: 23
I wrote this program to do the math for you. Just enter in the summary statistics. No need to provide an array:
One-Sample Z-Test for a Population Proportion:
To do this for mean rather than proportion, change the formula for z accordingly
EDIT:
Here is the content from the link:
import scipy.stats as stats
import math
def one_sample_ztest_pop_proportion(tail, p, pbar, n, alpha):
#Calculate test stat
sigma = math.sqrt((p*(1-p))/(n))
z = round((pbar - p) / sigma, 2)
if tail == 'lower':
pval = round(stats.norm(p, sigma).cdf(pbar),4)
print("Results for a lower tailed z-test: ")
elif tail == 'upper':
pval = round(1 - stats.norm(p, sigma).cdf(pbar),4)
print("Results for an upper tailed z-test: ")
elif tail == 'two':
pval = round(stats.norm(p, sigma).cdf(pbar)*2,4)
print("Results for a two tailed z-test: ")
#Print test results
print("Test statistic = {}".format(z))
print("P-value = {}".format(pval))
print("Confidence = {}".format(alpha))
#Compare p-value to confidence level
if pval <= alpha:
print("{} <= {}. Reject the null hypothesis.".format(pval, alpha))
else:
print("{} > {}. Do not reject the null hypothesis.".format(pval, alpha))
#one_sample_ztest_pop_proportion('upper', .20, .25, 400, .05)
#one_sample_ztest_pop_proportion('two', .64, .52, 100, .05)
Upvotes: 1
Reputation: 141
In case you would like to find the area between 2 values of x mean = 1; standard deviation = 2; the probability of x between [0.5,2]
import scipy.stats
scipy.stats.norm(1, 2).cdf(2) - scipy.stats.norm(1,2).cdf(0.5)
Upvotes: 14
Reputation: 61666
Starting Python 3.8
, the standard library provides the NormalDist
object as part of the statistics
module.
It can be used to get the probability density function (pdf
- likelihood that a random sample X will be near the given value x) for a given mean (mu
) and standard deviation (sigma
):
from statistics import NormalDist
NormalDist(mu=100, sigma=12).pdf(98)
# 0.032786643008494994
Also note that the NormalDist
object also provides the cumulative distribution function (cdf
- probability that a random sample X will be less than or equal to x):
NormalDist(mu=100, sigma=12).cdf(98)
# 0.43381616738909634
Upvotes: 34
Reputation: 2445
Scipy.stats is a great module. Just to offer another approach, you can calculate it directly using
import math
def normpdf(x, mean, sd):
var = float(sd)**2
denom = (2*math.pi*var)**.5
num = math.exp(-(float(x)-float(mean))**2/(2*var))
return num/denom
This uses the formula found here: http://en.wikipedia.org/wiki/Normal_distribution#Probability_density_function
to test:
>>> normpdf(7,5,5)
0.07365402806066466
>>> norm(5,5).pdf(7)
0.073654028060664664
Upvotes: 61
Reputation: 679
Here is more info. First you are dealing with a frozen distribution (frozen in this case means its parameters are set to specific values). To create a frozen distribution:
import scipy.stats
scipy.stats.norm(loc=100, scale=12)
#where loc is the mean and scale is the std dev
#if you wish to pull out a random number from your distribution
scipy.stats.norm.rvs(loc=100, scale=12)
#To find the probability that the variable has a value LESS than or equal
#let's say 113, you'd use CDF cumulative Density Function
scipy.stats.norm.cdf(113,100,12)
Output: 0.86066975255037792
#or 86.07% probability
#To find the probability that the variable has a value GREATER than or
#equal to let's say 125, you'd use SF Survival Function
scipy.stats.norm.sf(125,100,12)
Output: 0.018610425189886332
#or 1.86%
#To find the variate for which the probability is given, let's say the
#value which needed to provide a 98% probability, you'd use the
#PPF Percent Point Function
scipy.stats.norm.ppf(.98,100,12)
Output: 124.64498692758187
Upvotes: 50
Reputation: 77
The formula cited from wikipedia mentioned in the answers cannot be used to calculate normal probabilites. You would have to write a numerical integration approximation function using that formula in order to calculate the probability.
That formula computes the value for the probability density function. Since the normal distribution is continuous, you have to compute an integral to get probabilities. The wikipedia site mentions the CDF, which does not have a closed form for the normal distribution.
Upvotes: 4
Reputation: 1
You can just use the error function that's built in to the math library, as stated on their website.
Upvotes: 0
Reputation: 353059
There's one in scipy.stats:
>>> import scipy.stats
>>> scipy.stats.norm(0, 1)
<scipy.stats.distributions.rv_frozen object at 0x928352c>
>>> scipy.stats.norm(0, 1).pdf(0)
0.3989422804014327
>>> scipy.stats.norm(0, 1).cdf(0)
0.5
>>> scipy.stats.norm(100, 12)
<scipy.stats.distributions.rv_frozen object at 0x928352c>
>>> scipy.stats.norm(100, 12).pdf(98)
0.032786643008494994
>>> scipy.stats.norm(100, 12).cdf(98)
0.43381616738909634
>>> scipy.stats.norm(100, 12).cdf(100)
0.5
[One thing to beware of -- just a tip -- is that the parameter passing is a little broad. Because of the way the code is set up, if you accidentally write scipy.stats.norm(mean=100, std=12)
instead of scipy.stats.norm(100, 12)
or scipy.stats.norm(loc=100, scale=12)
, then it'll accept it, but silently discard those extra keyword arguments and give you the default (0,1).]
Upvotes: 170