Reputation: 155
I want to get the probability to get a value X higher than x_i, which means the cumulative distribution functions CDF. P(X>=x_i). I've tried to do it in Matlab with this code.
Let's assume the data is in the column vector p1.
xp1 = linspace(min(p1), max(p1)); %range of bins
histp1 = histc(p1(:), xp1); %histogram od data
probp1 = histp1/sum(histp1); %PDF (probability distribution function)
`figure;plot(probp1, 'o') `
Now I want to calculate the CDF,
sorncount = flipud(histp1);
cumsump1 = cumsum(sorncount);
normcumsump1 = cumsump1/max(cumsump1);
cdf = flipud(normcumsump1);
figure;plot(xp1, cdf, 'ok');
I'm wondering whether anyone can help me to know if I'm ok or am I doing something wrong?
Upvotes: 2
Views: 2409
Reputation:
Your code works correctly, but is a bit more complicated than it could be. Since probp1 has been normalized to have sum equal to 1, the maximum of its cumulative sum is guaranteed to be 1, so there is no need to divide by this maximum. This shortens the code a bit:
xp1 = linspace(min(p1), max(p1)); %range of bins
histp1 = histc(p1(:), xp1); %count for each bin
probp1 = histp1/sum(histp1); %PDF (probability distribution function)
cdf = flipud(cumsum(flipud(histp1))); %CDF (unconventional, of P(X>=a) kind)
As Raab70 noted, most of the time CDF is understood as P(X<=a), in which case you don't need flipud
: taking cumsum(histp1)
is all that's needed.
Also, I would probably use histp1(end:-1:1)
instead of flipud(histp1)
, so that the vector is flipped no matter if it's a row or column.
Upvotes: 1