Teodor Petrut
Teodor Petrut

Reputation: 63

Normal PDF's integral not equal to one using MATLAB's normpdf

The following methods are supposed to compute a PDF:

bins = [20 23 31.5 57 62.5 89 130];  % classes of values of my random variable

mean = 23;   
std  = mean/2;
values = mean + std*randn(1000,1);  % the actual values of the RV

% method 1

[num, bins] = hist(values, bins);  % histogram on the previously defined bins
pdf2 = num/trapz(bins, num);
trapz(bins, pdf2)  % checking the integral under the curve, which is indeed 1
ans =
 1

% method 2
pdf1 = normpdf(bins, mean, std); % the Matlab command for creating o normal PDF
trapz(bins, pdf1)  % this is NOT equal to 1
ans =
0.7069

However if I consider the bins something like

bins = [0:46];

the results are

ans =
 1
ans =
0.9544

so I still have not the value 1 for the integral in the case of normpdf.

Why does the normpdf not give the integral equal to 1 for the PDFs? Is there something I'm missing in the codes above?

Upvotes: 1

Views: 997

Answers (1)

Holt
Holt

Reputation: 37626

The problem is that you are missing a lot of values from your PDF, if you take bins = [0:46], you have the following curve:

enter image description here

Which means you are missing all the part x < 0 and x > 46, so the integral you are computing is not from -oo to +oo as you expect but from 0 to 46, obvisouly you won't get the correct answer.

Note that you have mean = 23 and std = 11.5, thus if you have bins = 0:46 you have a band around the mean with a width of one std on each side, so according to the 68–95–99.7 rule, 95% of the values lie in this band, which is consistent with the 0.9544 you get.

If you take bins = -11.5:57.5, you now have three standard deviations on each side, and you will get 99.7% of the values in this band (MATLAB gives me 0.9973, the filled area are the ones you did not have with bins = 0:46):

enter image description here

Note that if you want to reach 1.000 with an error better than 10-3, you need about 3.4 standard deviations1:

>> bins = (mean - 3.4 * std):(mean + 3.4 * std) ;
>> pdf  = normpdf(bins, mean, std) ;
>> trapz(bins, pdf)
0.9993

Note that with bins = [20 23 31.5 57 62.5 89 130];, you have both a problem of precision and a problem of missing values (your curve is the blue one, the red one was generated using bins = 20:130):

enter image description here

Obvisouly, if you compute the area under the blue curve, you won't get the value of the area under the red one, and you won't certainly get a value close to the integral of the red curve between -oo and +oo.

Upvotes: 5

Related Questions