Reputation: 429
I referred this post:
https://stackoverflow.com/questions/38141951/why-does-scipy-norm-pdf-sometimes-give-pdf-1-how-to-correct-it
But I still have some confusion
import scipy.stats as stats
x = np.array([ 0.7972, 0.0767, 0.4383, 0.7866, 0.8091,
0.1954, 0.6307, 0.6599, 0.1065, 0.0508])
print('mean:', x.mean()) # 0.45511999999999986
print('std', x.std()) # 0.30346538451691657
y = stats.norm.pdf(x, mean, std)
plt.plot(x, y, c='b')
plt.show()
This means that the probability of an mean value is 131%?
Given a point and how to compute the probability of a value? Is this possible?
Add my situation:
I understand that in a continuous variable the probability of any point is 0.
But, my users ask me what is the probability of 100 in my data, how can I quantify 100?
Upvotes: 3
Views: 1003
Reputation: 80409
As already mentioned, the probability of x being 100 is 0 in a continuous distribution. The correct question to ask is something like "the probability of x being between 99.5 and 100.5". This can be calculated by subtracting the cdf
of both ends of the interval. It is also equal to the area below the curve for that interval:
from matplotlib import pyplot as plt
import numpy as np
import scipy.stats as stats
x = np.array([0.7972, 0.0767, 0.4383, 0.7866, 0.8091,
0.1954, 0.6307, 0.6599, 0.1065, 0.0508])
mean = x.mean()
std = x.std()
print('mean:', mean) # 0.45511999999999986
print('std', std) # 0.30346538451691657
val = 0.4
eps = 0.05
prob_close_to_val = stats.norm.cdf(val + eps, mean, std) - stats.norm.cdf(val - eps, mean, std)
print(f"probability of being close to {val}: {prob_close_to_val * 100:.2f} %")
# probability of being close to 0.5: 12.95 %
xs = np.linspace(mean - std * 3, mean + std * 3, 200)
ys = stats.norm.pdf(xs, x.mean(), x.std())
plt.plot(xs, ys, c='b')
plt.fill_between(xs, 0, ys, where=(xs >= val - eps) & (xs <= val + eps), color='r', alpha=0.3)
plt.ylim(ymin=0)
plt.margins(x=0)
plt.show()
To interpret the value of 1.3
in the y-axis: the probability of x falling in a small zone of width w
around x=0.5
is close to 1.3/w
. Choosing w=0.1
then gives 1.3/0.1
or about 13%.
Upvotes: 3
Reputation: 12407
y
is a probability density function and x
is a continuous variable and the probability of any value in continuous domain is 0. What that value (pdf in general) means for a continuous domain is that the probability of values to lie in the interval (mean-dx/2, mean+dx/2)
is approximately 1.314622*dx
assuming dx
is small (in the limit dx->0
they become equal). In fact you can have a delta function with infinite value at its center as your pdf (as long as the area under pdf sums to 1). For more information, you can check out Wikipedia: https://en.wikipedia.org/wiki/Probability_density_function
You should be careful to not mix it up with a probability mass function for a discrete random variable which represents the probability of variable being equal to a value.
Upvotes: 0
Reputation: 117
The function you are using computes the value of the probability density function at the mean, i.e., at the peak of the Gaussian.
The probability density function has integral 1. This does not mean that the values of the probability density function must be always smaller than 1.
Upvotes: 0