Reputation: 939
I am using simple kmeans in R to cluster a single vector. Since cluster numbers are rather arbitrarily assigned (I presume), I need to get them in order (using the cluster center).
here is an exmaple:
> vals <- c(0.22, 0.17, 0.21, 0.13, 0.00)
> set.seed(32833)
> cl <- kmeans(vals ,3)
> cl$cluster
[1] 2 3 2 3 1
> cl$centers
[,1]
1 0.000
2 0.215
3 0.150
As you can see from the cluster centers, the order of clusters in (ascending) order by Cluter center is : 1,3,2.
I want to return a vector of identified clusters transformed accordingly:
e.g. transform(cl$cluster) should give me 3 2 3 2 1.
I have tried chaning the factor levels by ordering but not able to get it to logical end.
> cl$cluster <- as.factor(as.character(cl$cluster))
> levels(cl$cluster) <- order(-cl$centers)
> cl$cluster
[1] 3 1 3 1 2
Levels: 2 3 1
Upvotes: 0
Views: 890
Reputation: 11
I think the good answer is to use rank() and not order() in the last line. In this particular example the result is the same but in other case the result with function order() is wrong.
Here an other example with 4 clusters where the good solution is to use rank() to reorder the values of clusters in increasing order :
vals <- c(0.22, 0.17, 0.21, 0.13, 0.00, 0.40, 0.50)
set.seed(32833)
cl <- kmeans(vals ,4)
cl$cluster
[1] 4 2 4 2 3 1 1
cl$centers
[,1]
1 0.450
2 0.150
3 0.000
4 0.215
order(cl$centers)[cl$cluster]
[1] 1 2 1 2 4 3 3
rank(cl$centers)[cl$cluster]
[1] 3 2 3 2 1 4 4
The good result here is obtain with rank() who work at each time.
Upvotes: 1
Reputation: 1189
You could index the cluster vector by the order of the center vector. In your example,
vals <- c(0.22, 0.17, 0.21, 0.13, 0.00)
set.seed(32833)
cl <- kmeans(vals ,3)
cl$cluster
[1] 2 3 2 3 1
cl$centers
[,1]
1 0.000
2 0.215
3 0.150
order(cl$centers)[cl$cluster]
[1] 3 2 3 2 1
Someone else can chime in with an as.factor
solution, as that's an option as well.
Upvotes: 0