vincentlai
vincentlai

Reputation: 429

how to get the probability of a value given samples in python?


I referred this post: https://stackoverflow.com/questions/38141951/why-does-scipy-norm-pdf-sometimes-give-pdf-1-how-to-correct-it

But I still have some confusion

import scipy.stats as stats
x = np.array([ 0.7972,  0.0767,  0.4383,  0.7866,  0.8091,
               0.1954,  0.6307,  0.6599,  0.1065,  0.0508])

print('mean:', x.mean())  # 0.45511999999999986
print('std', x.std())  # 0.30346538451691657

y = stats.norm.pdf(x, mean, std)
plt.plot(x, y, c='b')
plt.show()

enter image description here

enter image description here

This means that the probability of an mean value is 131%?
Given a point and how to compute the probability of a value? Is this possible?

Add my situation:
I understand that in a continuous variable the probability of any point is 0.
But, my users ask me what is the probability of 100 in my data, how can I quantify 100?

Upvotes: 3

Views: 1003

Answers (3)

JohanC
JohanC

Reputation: 80409

As already mentioned, the probability of x being 100 is 0 in a continuous distribution. The correct question to ask is something like "the probability of x being between 99.5 and 100.5". This can be calculated by subtracting the cdf of both ends of the interval. It is also equal to the area below the curve for that interval:

from matplotlib import pyplot as plt
import numpy as np
import scipy.stats as stats

x = np.array([0.7972, 0.0767, 0.4383, 0.7866, 0.8091,
              0.1954, 0.6307, 0.6599, 0.1065, 0.0508])

mean = x.mean()
std = x.std()
print('mean:', mean)  # 0.45511999999999986
print('std', std)  # 0.30346538451691657

val = 0.4
eps = 0.05
prob_close_to_val = stats.norm.cdf(val + eps, mean, std) - stats.norm.cdf(val - eps, mean, std)
print(f"probability of being close to {val}: {prob_close_to_val * 100:.2f} %")
# probability of being close to 0.5: 12.95 %

xs = np.linspace(mean - std * 3, mean + std * 3, 200)
ys = stats.norm.pdf(xs, x.mean(), x.std())
plt.plot(xs, ys, c='b')
plt.fill_between(xs, 0, ys, where=(xs >= val - eps) & (xs <= val + eps), color='r', alpha=0.3)
plt.ylim(ymin=0)
plt.margins(x=0)
plt.show()

explanation plot

To interpret the value of 1.3 in the y-axis: the probability of x falling in a small zone of width w around x=0.5 is close to 1.3/w. Choosing w=0.1 then gives 1.3/0.1 or about 13%.

Upvotes: 3

Ehsan
Ehsan

Reputation: 12407

y is a probability density function and x is a continuous variable and the probability of any value in continuous domain is 0. What that value (pdf in general) means for a continuous domain is that the probability of values to lie in the interval (mean-dx/2, mean+dx/2) is approximately 1.314622*dx assuming dx is small (in the limit dx->0 they become equal). In fact you can have a delta function with infinite value at its center as your pdf (as long as the area under pdf sums to 1). For more information, you can check out Wikipedia: https://en.wikipedia.org/wiki/Probability_density_function

You should be careful to not mix it up with a probability mass function for a discrete random variable which represents the probability of variable being equal to a value.

Upvotes: 0

Davide Dal Bosco
Davide Dal Bosco

Reputation: 117

The function you are using computes the value of the probability density function at the mean, i.e., at the peak of the Gaussian.

The probability density function has integral 1. This does not mean that the values of the probability density function must be always smaller than 1.

Upvotes: 0

Related Questions