Reputation: 1081
I try to write an algorithm which determine $\mu$, $\sigma$,$\pi$ for each class from a mixture multivariate normal distribution.
I finish with the algorithm partially, it works when I set the random guess values($\mu$, $\sigma$,$\pi$) near from the real value. But when I set the values far from the real one, the algorithm does not converge. The sigma goes to 0 $(2.30760684053766e-24 2.30760684053766e-24)$.
I think the problem is my covarience calculation, I am not sure that this is the right way. I found this on wikipedia. I would be grateful if you could check my algorithm. Especially the covariance part.
Have a nice day, Thanks,
2 mixture gauss
size x = [400, 2] (400 point 2 dimension gauss)
mu = 2 , 2 (1 row = first gauss mu, 2 row = second gauss mu)
for i = 1 : k
gaussEvaluation(i,:) = pInit(i) * mvnpdf(x,muInit(i,:), sigmaInit(i, :) * eye(d));
gaussEvaluationSum = sum(gaussEvaluation(i, :));
%mu calculation
for j = 1 : d
mu(i, j) = sum(gaussEvaluation(i, :) * x(:, j)) / gaussEvaluationSum;
end
%sigma calculation methode 1
%for j = 1 : n
% v = (x(j, :) - muNew(i, :));
% sigmaNew(i) = sigmaNew(i) + gaussEvaluation(i,j) * (v * v');
%end
%sigmaNew(i) = sigmaNew(i) / gaussEvaluationSum;
%sigma calculation methode 2
sub = bsxfun(@minus, x, mu(i,:));
sigma(i,:) = sum(gaussEvaluation(i,:) * (sub .* sub)) / gaussEvaluationSum;
%p calculation
p(i) = gaussEvaluationSum / n;
Upvotes: 2
Views: 227
Reputation: 36900
Two points: you can observe this even when you implement gaussian mixture EM correctly, but in your case, the code does seem to be incorrect.
First, this is just a problem that you have to deal with when fitting mixtures of gaussians. Sometimes one component of the mixture can collapse on to a single point, resulting in the mean of the component becoming that point and the variance becoming 0; this is known as a 'singularity'. Hence, the likelihood also goes to infinity.
Check out slide 42 of this deck: http://www.cs.ubbcluj.ro/~csatol/gep_tan/Bishop-CUED-2006.pdf
The likelihood function that you are evaluating is not log-concave, so the EM algorithm will not converge to the same parameters with different initial values. The link I gave above also gives some solutions to avoid this over-fitting problem, such as putting a prior or regularization term on your parameters. You can also consider running multiple times with different starting parameters and discarding any results with variance 0 components as having over-fitted, or just reduce the number of components you are using.
In your case, your equation is right; the covariance update calculation on Wikipedia is the same as the one on slide 45 of the above link. However, if you are in a 2d space, for each component the mean should be a length 2 vector and the covariance should be a 2x2 matrix. Hence your code (for two components) is wrong because you have a 2x2 matrix to store the means and a 2x2 matrix to store the covariances; it should be a 2x2x2 matrix.
Upvotes: 2