Reputation: 6290
I'm building up on my preivous question because there is a further issue.
I have fitted in Matlab a normal distribution to my data vector: PD = fitdist(data,'normal')
. Now I have a new data point coming in (e.g. x = 0.5) and I would like to calculate its probability.
Using cdf(PD,x)
will not work because it gives the probability that the point is smaller or equal to x (but not exactly x). Using pdf(PD,x)
gives just the densitiy but not the probability and so it can be greater than one.
How can I calculate the probability?
Upvotes: 2
Views: 4612
Reputation: 4519
Let's say you have a random variable X
that follows the normal distribution with mean mu
and standard deviation s
.
Let F be the cumulative distribution function for the normal distribution with mean mu
and standard deviation s
. The probability the random variableX
falls between a
and b
, that is P(a < X <= b) = F(b) - F(a).
In Matlab code:
P_a_b = normcdf(b, mu, s) - normcdf(a, mu, s);
Note: observe that the probability X is exactly equal to 0.5 (or any specific value) is zero! A range of outcomes will have positive probability, but an insufficient sum of individual outcomes will have probability zero.
Upvotes: 1
Reputation: 51998
If the distribution is continuous then the probability of any point x
is 0, almost by definition of continuous distribution. If the distribution is discrete and, furthermore, the support of the distribution is a subset of the set of integers, then for any integer x its probability is
cdf(PD,x) - cdf(PD,x-1)
More generally, for any random variable X which takes on integer values, the probability mass function f(x)
and the cumulative distribution F(x)
are related by
f(x) = F(x) - F(x-1)
The right hand side can be interpreted as a discrete derivative, so this is a direct analog of the fact that in the continuous case the pdf is the derivative of the cdf.
I'm not sure if matlab has a more direct way to get at the probability mass function in your situation than going through the cdf like that.
In the continuous case, your question doesn't make a lot of sense since, as I said above, the probability is 0. Non-zero probability in this case is something that attaches to intervals rather than individual points. You still might want to ask for the probability of getting a value near x
-- but then you have to decide on what you mean by "near". For example, if x
is an integer then you might want to know the probability of getting a value that rounds to x. That would be:
cdf(PD, x + 0.5) - cdf(PD, x - 0.5)
Upvotes: 4