Cluster groups based on pairwise distances

Question

I have an n x n matrix with pairwise distances as entries. The matrix looks for example like this:

m = matrix (c(0, 0, 1, 1, 1, 1,0, 0, 1, 1, 0, 1,1, 1, 0, 1, 1, 0,1, 1, 1, 0, 1, 1,1, 0, 1, 1, 0, 1,1, 1, 0, 1, 1, 0),ncol=6, byrow=TRUE)
colnames(m) <- c("A","B","C","D","E","F")
rownames(m) <- c("A","B","C","D","E","F")

Now I want to put every letter in the same cluster if the distance to any other letter is 0. For the example above, I should get three clusters consisting of:

(A,B,E)

(C,F)

(D)

I would be interested in the number of entries in each cluster. At the end, I want to have a vector like:

clustersizes = c(3,2,1)

I assume it is possible by using the hclust function, but I'm not able to extract the three clusters. I also tried the cutree function, but if I don't know the number of clusters before and also not the cutoff for the height, how should I do it?

This is what I tried:

h <- hclust(dist(m),method="single")
plot(h)

Thanks!

ekstroem · Accepted Answer

Welcome to SO.

There are several ways to handle this but an easy choice is to use the igraph package.

First we convert your matrix m to an adjacency matrix. It contains the distances to neighbouring nodes, where 0 means no connection. Thus, we subtract your matrix from 1 to get that

mm <- 1 - m  
diag(mm) <- 0 # We don't allow loops

This gives

> mm
  A B C D E F
A 0 1 0 0 0 0
B 1 0 0 0 1 0
C 0 0 0 0 0 1
D 0 0 0 0 0 0
E 0 1 0 0 0 0
F 0 0 1 0 0 0

Then we just need to feed it to igraph to compute communities

library("igraph")
fastgreedy.community(as.undirected(graph.adjacency(mm)))

which produces

IGRAPH clustering fast greedy, groups: 3, mod: 0.44
+ groups:
  $`1`
  [1] "A" "B" "E"

  $`2`
  [1] "C" "F"

  $`3`
  [1] "D"

Now if you save that result you can get the community sizes right away

res < fastgreedy.community(as.undirected(graph.adjacency(mm)))
sizes(res)

which yields

Community sizes
1 2 3 
3 2 1

Cluster groups based on pairwise distances

Answers (2)

Related Questions