K-means more accurate than Gaussian mixture model in certain image regions

Question

I know that the Gaussian mixture model is a generalization of K-means, and thus should be more accurate.

But I cannot tell on the clustered image below why the results obtained with K-means are more accurate in certain regions (like the speckle noise shown as light-blue dots, persisting in the river in Gaussian Mixture Model results but not in K-means results).

Below is the matlab code for both methods:

% kmeans
L1 = kmeans(X, 2, 'Replicates', 5);
kmeansClusters = reshape(L1, [numRows numCols]);
figure('name', 'Kmeans clustering')
imshow(label2rgb(kmeansClusters))

% gaussian mixture model
gmm = fitgmdist(X, 2);
L2 = cluster(gmm, X);
gmmClusters = reshape(L2, [numRows numCols]);
figure('name', 'GMM clustering')
imshow(label2rgb(gmmClusters))

And in the following are shown the original image, as well as the clustered results:

Original image:

K-means:

Gaussian Mixture Model:

P.S: I'm clustering using the intensity information alone, and the number of clusters is 2 (i.e. water and land).

cfcurtis · Accepted Answer

I think this is an interesting question/problem, so I spent a bit of time playing around.

First off, the assumption that the Gaussian mixture model should be more accurate than k-means is not necessarily true. They have different assumptions and while GMM is more flexible, there's no rule that says that it should always be better, particularly with something so subjective as image classificaiton.

With k-means clustering you're trying to assign the pixels to one of two buckets purely based on the distance from the mean or centroid of that bucket. If I take a look at the speckle noise in the river, the values fall in between the two centroids. Plotting the histogram of the image and superimposing the positions of the centroids and the speckle noise, I get this:

You can see that the speckle noise is closer to the centroid of the darker stuff (water), so it is assigned to the water bucket. This is basically the same thing as a Gaussian mixture model with equal variance and equal weight.

One of the advantages of a GMM is the ability to consider the variance of the two categories. Instead of simply finding two centroids and drawing a line between them to separate your categories, the GMM finds two Gaussians that best fit your data. This is a really good example image because you can clearly see two dominant shapes: one that's tall and skinny and one that's short and broad. The GMM algorithm sees the data as this:

Here you can see that the speckle noise clearly falls within the broad variance of the land pdf.

Another difference between k-means and GMM is in how the pixels are clustered. In GMM, the two distributions are used to assign a probability value to each pixel, so it's fuzzy - it doesn't say "this pixel definitely is land", it says (e.g.) "this pixel has a 30% chance of being water and a 70% chance of being land", so it assigns it as land. In this particular example the water histogram is very tight, so it (incorrectly, in this case) decides that it is very unlikely for that speckle noise to actually be water.

K-means more accurate than Gaussian mixture model in certain image regions

Answers (1)

Related Questions