Reputation: 55
I'm currently try to get my head around unsuperivsed machine learning, i.e. clustering, and get a bit confused.
First of all, here is why I need a cluster algorithm. I computed a dissimilarity matrix N x N, where I compare the (dis)similarity of binary trees. That means for the entry Ni,i the value is zero (means the diagonal is zero) and for the entry Ni,j the value is ≥ 0. This is a matrix which contains 100 x 100 elements, i.e. I have 100 binary trees which I compare with each other. This matrix gets computed outside of R. The distances in my matrix are tree edit distances and satisfying the triangle inequality.
Which clustering algorithm I'm actually allowed to use with just these information? I'm pretty sure I can use hierarchical clustering, but how would I perform a k-means oder PAM clustering in R with just this matrix?
Upvotes: 3
Views: 1342
Reputation: 77454
You can't use k-means. Because it needs to compute the means, and the distance from the mean. That won't work on trees.
HAC, PAM and DBSCAN are fine. DBSCAN is the most scalable of these three, but also works better if you have enough data - your sample may be too small for this. So I'd use HAC.
Upvotes: 2