Reputation: 11

Determine cluster membership in R

This is my vector before kmeans -

> sort(table(mydata))
mydata
23  7  9  4 10  3  5  8  2  1 
 1  3  3  4  5  6  6  6  7  9

km <- kmeans(mydata, centers = 10)

After kmeans -

> sort(table(km$cluster))
km$cluster
 1  6  7  3  5  2  4 10  8  9 
 1  3  3  4  5  6  6  6  7  9

Clearly, all my 1s are stored in cluster 9, all 2s are stored in Cluster 8 and so on.

Can I find using R which cluster a particular number belongs to? Say, finding which cluster my 1s are in?

Upvotes: 0

Answers (2)

desertnaut

Reputation: 60390

Extending on MrFlick's answer (upvoted), and in case you want the cluster number programmatically, you could do also this (utilizing the magrittr package, to get rid of all these nested parentheses):

library(magrittr)
data.point <- 5  # put the data point here
cluster.no <- c(mydata==data.point)  %>% which %>% km$cluster[.] %>% unique

Examples:

library(magrittr)
set.seed(42)  # for reproducibility
mydata <- rep(c(23,7,9,4,10,3,5,8,2,1), c(1,3,3,4,5,6,6,6,7,9))
km <- kmeans(mydata, centers = 10) 

data.point <- 23
c(mydata==data.point)  %>% which %>% km$cluster[.] %>% unique
# 8
data.point <- 10
c(mydata==data.point)  %>% which %>% km$cluster[.] %>% unique
# 1

Upvotes: 2

MrFlick

Reputation: 206546

The values for $cluster are returned in the same order as your original data.

mydata <- rep(c(23,7,9,4,10,3,5,8,2,1), c(1,3,3,4,5,6,6,6,7,9))
sort(table(mydata))
# mydata
# 23  7  9  4 10  3  5  8  2  1 
#  1  3  3  4  5  6  6  6  7  9 

km <- kmeans(mydata, centers = 10) 
unique(cbind(value=mydata, clust=km$cluster))
#       value clust
#  [1,]    23     9
#  [2,]     7     5
#  [3,]     9     7
#  [4,]     4     4
#  [5,]    10     1
#  [6,]     3    10
#  [7,]     5     2
#  [8,]     8     8
#  [9,]     2     6
# [10,]     1     3

Here i've just re-joined the two with cbind and used unique to eliminate all the dups since you have such discrete data.

Upvotes: 5

Determine cluster membership in R

Answers (2)

Related Questions