grammar
grammar

Reputation: 939

Change the number of cluster produced by kmeans in R according to cluster center

I am using simple kmeans in R to cluster a single vector. Since cluster numbers are rather arbitrarily assigned (I presume), I need to get them in order (using the cluster center).

here is an exmaple:

> vals <- c(0.22, 0.17, 0.21, 0.13, 0.00)
> set.seed(32833)
> cl <- kmeans(vals ,3)

> cl$cluster
[1] 2 3 2 3 1

> cl$centers
   [,1]
1 0.000
2 0.215
3 0.150

As you can see from the cluster centers, the order of clusters in (ascending) order by Cluter center is : 1,3,2.

I want to return a vector of identified clusters transformed accordingly:

e.g. transform(cl$cluster) should give me 3 2 3 2 1.

I have tried chaning the factor levels by ordering but not able to get it to logical end.

> cl$cluster <- as.factor(as.character(cl$cluster))
> levels(cl$cluster) <- order(-cl$centers)
> cl$cluster
[1] 3 1 3 1 2
Levels: 2 3 1

Upvotes: 0

Views: 890

Answers (2)

Guillaume Malherbe
Guillaume Malherbe

Reputation: 11

I think the good answer is to use rank() and not order() in the last line. In this particular example the result is the same but in other case the result with function order() is wrong.

Here an other example with 4 clusters where the good solution is to use rank() to reorder the values of clusters in increasing order :

vals <- c(0.22, 0.17, 0.21, 0.13, 0.00, 0.40, 0.50)
set.seed(32833)
cl <- kmeans(vals ,4)

cl$cluster
[1] 4 2 4 2 3 1 1

cl$centers
[,1]
1 0.450
2 0.150
3 0.000
4 0.215

order(cl$centers)[cl$cluster]
[1] 1 2 1 2 4 3 3

rank(cl$centers)[cl$cluster]
[1] 3 2 3 2 1 4 4

The good result here is obtain with rank() who work at each time.

Upvotes: 1

Tad Dallas
Tad Dallas

Reputation: 1189

You could index the cluster vector by the order of the center vector. In your example,

vals <- c(0.22, 0.17, 0.21, 0.13, 0.00)
set.seed(32833)
cl <- kmeans(vals ,3)

cl$cluster
[1] 2 3 2 3 1

cl$centers
   [,1]
1 0.000
2 0.215
3 0.150

order(cl$centers)[cl$cluster]
[1] 3 2 3 2 1

Someone else can chime in with an as.factor solution, as that's an option as well.

Upvotes: 0

Related Questions