Get cluster mean in k-means clustering analysis with R

Question

I created two clusters using the k-means algorithm. Each cluster contains 4 variables. If I want to get the means of each variables in each cluster, should I do:

clusteredsubset$centers

or

colMeans(y[clusteredsubset$cluster == 1,])
colMeans(y[clusteredsubset$cluster == 2,])

where y is the data matrix (4 columns) and clusteredsubset is the result of kmeans.

Zheyuan Li · Accepted Answer

Either one is fine, as they give the same result. But since kmeans returns centers, why not use it?

The following is based on the first example from ?kmeans:

set.seed(0)
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
cl <- kmeans(x, 2)

## what `kmeans` returns
cl$centers
#              x            y
#1 -0.0008158201 -0.008394296
#2  0.9261878482  1.029984748

## manual computation
colMeans(x[cl$cluster == 1, ])
#            x             y 
#-0.0008158201 -0.0083942957 

colMeans(x[cl$cluster == 2, ])
#        x         y 
#0.9261878 1.0299847

The results are exactly the same (the difference in number of digits is just a printing effect).

## make a plot
plot(x, col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex = 2)

Get cluster mean in k-means clustering analysis with R

Answers (2)

Related Questions