Understanding Matlab example fit a Mixture of Two Normals distribution

Question

I am following the example to fit a Mixture of Two Normals distribution that you can find here.

x = [trnd(20,1,50) trnd(4,1,100)+3];
hist(x,-2.25:.5:7.25);

pdf_normmixture = @(x,p,mu1,mu2,sigma1,sigma2) ...
                         p*normpdf(x,mu1,sigma1) + (1-p)*normpdf(x,mu2,sigma2);

pStart = .5;
muStart = quantile(x,[.25 .75])
sigmaStart = sqrt(var(x) - .25*diff(muStart).^2)
start = [pStart muStart sigmaStart sigmaStart];

lb = [0 -Inf -Inf 0 0];
ub = [1 Inf Inf Inf Inf];

options = statset('MaxIter',300, 'MaxFunEvals',600);
paramEsts = mle(x, 'pdf',pdf_normmixture, 'start',start, ...
                          'lower',lb, 'upper',ub, 'options',options)


bins = -2.5:.5:7.5;
h = bar(bins,histc(x,bins)/(length(x)*.5),'histc');
h.FaceColor = [.9 .9 .9];
xgrid = linspace(1.1*min(x),1.1*max(x),200);
pdfgrid = pdf_normmixture(xgrid,paramEsts(1),paramEsts(2),paramEsts(3),paramEsts(4),paramEsts(5));
hold on
plot(xgrid,pdfgrid,'-')
hold off
xlabel('x')
ylabel('Probability Density')

Could you please explain why when it calculates

h = bar(bins,histc(x,bins)/(length(x)*.5),'histc');

it divides for (length(x)*.5)

Dan · Accepted Answer

The idea is to scale your histogram such that is represents probability instead of counts. This is the unscaled histogram

The vertical axis is the count of how many events fall within each bin. You have defined your bins to be -2.25:.5:7.25 and thus your bin width is 0.5. So if we look at the first bar of the histogram, it is telling us that the number of elements in x (or the number of events in your experiment) that fall in the bin -2.5 to -2 (note the width of 0.5) is 2.

But now you want to compare this with a probability distribution function and we know that the integral of a PDF is 1. This is the same as saying the area under the PDF curve is 1. So if we want our histogram's vertical scale to match the of the PDF as in this second picture

we need to scale it such that the total area of all the histogram's bars sum to 1. The area of the first bar of the histogram is height times width which according to our investigation above is 2*0.5. Now the width stays the same for all the bins in the histogram so we can find its total area by adding up all the bar heights and then multiplying by the width once at the end. The sum of all the heights in the histogram is the total number of events, which is the total number of elements in x or length(x). Thus the area of the first histogram is length(x)*0.5 and to make this area equal to 1 we need to scale all the bar heights by dividing them by length(x)*0.5.

Understanding Matlab example fit a Mixture of Two Normals distribution

Answers (1)

Related Questions