K-centers clustering using R - is the resulting plot off?

Question

I am trying to do k-means clustering using R, and this is what I have done so far:

tmp <- kmeans(ds, centers = 4, iter.max = 1000) 

plot(ds[tmp$cluster==1,c(1,5)], col = "red", xlim = c(min(ds[,1]),  
  max(ds[,1])), ylim = c(min(ds[,5]), max(ds[,5])))
  points(ds[tmp$cluster==2,c(1,5)], col = "blue")
  points(ds[tmp$cluster==3,c(1,5)], col = "seagreen")
  points(ds[tmp$cluster==4,c(1,5)], col = "orange")
  points(tmp$centers[,c(1,5)], col = "black")

and I get the following graph:

I am quite new to this, so I may be way off, but this graph does not look quite right to me. The data is basically divided in zones and to be honest, I was expecting to see something along the lines of this:

The circles in this picture are just to showcase where I was expecting the clusters to be. Can anyone explain why the data is clustered like that? I did the clustering multiple times and I always end up with this result.

The dataset I am using can be found here.

G5W · Accepted Answer

Notice that Age runs from about 18 to 60, so the maximum distance between age is about 40. Now notice that the incomes range from 0 to 20000. The distance between points is heavily dominated by the income. If you wish both variables to be used in the clustering, you should scale the data before clustering. Try

tmp<-kmeans(scale(ds), centers = 4, iter.max = 1000)

K-centers clustering using R - is the resulting plot off?

Answers (2)

Related Questions