user1885116
user1885116

Reputation: 1797

kmeans classification to predetermined centroids

I am trying to assign datapoints (through euclidean distance) to a known, predefined, set of center points, assigning points to the fixed center point that is closest.

I have the feeling that i am probably overcomplicating / missing something basic, but i have tried to do this with a kmeans implementation with predetermined centers and no iterations. However, as per code below, and probably because the algo will do one iteration, this fails to work (cl$centers have "moved" and are not equal to the original centroids)

Is there another, simple way of assigning the points in matrix X to the nearest centers?

Many thanks in advance, W

x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")

vector=c(0.25,0.5,0.75,1)
ccenters <- as.matrix(cbind(vector,vector))
colnames(ccenters) <- c("x", "y")
ccenters

(cl <- kmeans(x, centers=ccenters,iter.max=1))
plot(x, col = cl$cluster)
points(cl$centers, col = 1:4, pch = 8, cex = 2)
cl$centers
cl$centers==ccenters

Upvotes: 4

Views: 2678

Answers (1)

Vincent Zoonekynd
Vincent Zoonekynd

Reputation: 32351

You can directly compute the distances between each point and each center and look at the nearest center.

# All the distances (you could also use a loop)
distances <- outer( 
  1:nrow(x), 
  1:nrow(ccenters), 
  Vectorize( function(i,j) { 
    sum( (x[i,] - ccenters[j,])^2 )
  } )
)

# Find the nearest cluster
clusters <- apply( distances, 1, which.min )

# Plot
plot( x, col=clusters, pch=15 )
segments( ccenters[clusters,1], ccenters[clusters,2], x[,1], x[,2], col=clusters )

Upvotes: 3

Related Questions