vtroost
vtroost

Reputation: 95

sorting kmeans cluster labels according to the input values

I have a range of values and I want to identify the cluster with the lowest values using kmeans. However the cluster labels seem to be sorted in a different way then I was looking for.

test <- c(1,4,5,12,17,18,33,34)
cl <- kmeans(test, centers = 3, nstart =10)
cl$cluster
[1] 2 2 2 1 1 1 3 3
# whereas I would have expected to get
[1] 1 1 1 2 2 2 3 3

How can I sort the output from kmeans in the way that I want?

Upvotes: 0

Views: 695

Answers (1)

G5W
G5W

Reputation: 37641

There is no guarantee that low numbers will be grouped with other low numbers and you do not say precisely how you want the clusters ordered. Here is one way; you can order the clusters by the lowest point index in the cluster. That will produce the result that you asked for on this test data.

MT = aggregate(test, list(cl$cluster), min)
MT$Group.1[order(MT$x)[cl$cluster]]
[1] 1 1 1 2 2 2 3 3

If you want to propagate this change to cl you can just make the assignement

cl$cluster = MT$Group.1[order(MT$x)[cl$cluster]]

Upvotes: 1

Related Questions