Reputation: 151
I started with a list of hobbies and people, I wanted to cluster those people by their common hobbies. So, I created a distance matrix then I applied the hierarchal clustering and cutree to group the clustering into specific number of cluster. Now I have the cutree matrix but I do not know how to extract the clusters from it. Would you please advice?
Here is an example of what I mean.
The distance matrix:
one three two
one 0 1.0 1.0
three 1 0.0 0.5
two 1 0.5 0.0
Then I used the hclust and cutree and got this result:
hc <- hclust(dist, method="ward")
ct <- cutree(hc, k=1:3)
1 2 3
one 1 1 1
three 1 2 2
two 1 2 3
How do I get a list of people that belong in the same cluster?
Thank you for your help.
Upvotes: 1
Views: 3339
Reputation: 4920
Your k=1:3
will provide the predicted cluster for each of $k = {1, 2, 3}$. If you want to bundle groups according to cluster, assume WLOG that 2 is the number of clusters you're interested in, you simple need to concatenate the names of the matrix column by the matrix column entries.
Example:
hc <- hclust(dist(USArrests))
memb <- cutree(hc, k = 1:5)
tapply(names(memb[, 3]), memb[, 3], c) ## say we're interested in 3 clusters
Upvotes: 1
Reputation: 174778
ct
is a matrix, so you can index the columns to get the membership for groups of sizes 1:3. For example,
cp[, 2]
gives the non-trivial solution of assigning 3 observations to 2 groups.
To get the observations in each cluster, then using your data:
Dij <- matrix(c(0, 1.0, 1.0,
1, 0.0, 0.5,
1, 0.5, 0.0), ncol = 3, byrow = TRUE)
rownames(Dij) <- colnames(Dij) <- c("one", "two", "three")
hc <- hclust(as.dist(Dij), method="ward")
ct <- cutree(hc, k=1:3)
you can use the split()
function to split the row names of ct
(which are you observation/sample identifiers from the distance matrix, Dij
), breaking this up by the membership vector from whichever column of ct
you want to use. E.g.
> split(rownames(ct), ct[,2])
$`1`
[1] "one"
$`2`
[1] "two" "three"
Upvotes: 2