machinery
machinery

Reputation: 6290

How to calculate probability of a point using a probability distribution object?

I'm building up on my preivous question because there is a further issue.

I have fitted in Matlab a normal distribution to my data vector: PD = fitdist(data,'normal'). Now I have a new data point coming in (e.g. x = 0.5) and I would like to calculate its probability.

Using cdf(PD,x) will not work because it gives the probability that the point is smaller or equal to x (but not exactly x). Using pdf(PD,x) gives just the densitiy but not the probability and so it can be greater than one.

How can I calculate the probability?

Upvotes: 2

Views: 4612

Answers (2)

Matthew Gunn
Matthew Gunn

Reputation: 4519

Let's say you have a random variable X that follows the normal distribution with mean mu and standard deviation s.

Let F be the cumulative distribution function for the normal distribution with mean mu and standard deviation s. The probability the random variableX falls between a and b, that is P(a < X <= b) = F(b) - F(a).

In Matlab code:

P_a_b = normcdf(b, mu, s) - normcdf(a, mu, s);

Note: observe that the probability X is exactly equal to 0.5 (or any specific value) is zero! A range of outcomes will have positive probability, but an insufficient sum of individual outcomes will have probability zero.

Upvotes: 1

John Coleman
John Coleman

Reputation: 51998

If the distribution is continuous then the probability of any point x is 0, almost by definition of continuous distribution. If the distribution is discrete and, furthermore, the support of the distribution is a subset of the set of integers, then for any integer x its probability is

cdf(PD,x) - cdf(PD,x-1)

More generally, for any random variable X which takes on integer values, the probability mass function f(x) and the cumulative distribution F(x) are related by

f(x) = F(x) - F(x-1)

The right hand side can be interpreted as a discrete derivative, so this is a direct analog of the fact that in the continuous case the pdf is the derivative of the cdf.

I'm not sure if matlab has a more direct way to get at the probability mass function in your situation than going through the cdf like that.

In the continuous case, your question doesn't make a lot of sense since, as I said above, the probability is 0. Non-zero probability in this case is something that attaches to intervals rather than individual points. You still might want to ask for the probability of getting a value near x -- but then you have to decide on what you mean by "near". For example, if x is an integer then you might want to know the probability of getting a value that rounds to x. That would be:

cdf(PD, x + 0.5) - cdf(PD, x - 0.5)

Upvotes: 4

Related Questions