Reputation: 339
I am trying to clustering some words.
Some part of my data is as below (it's just example).
cat dog horse ostrich
cat 8 2.3 3.4 4.7
dog 7 8 3 2.4
horse 3.4 2.5 8 1.5
ostrich 3.4 3.2 4.4 8
The bigger number means that the similarity between two words is higher. Based on this kind of format data, I want to make a clusters (for example, (cat, dog), (horse), (ostrich) totally 3 clusters).
At first, I tried to use CLUTO... to make some clusters and a (very beautiful) graph as below.
But I can't... I already saw the manuals but it's not that easy to understand. So, I tried to use some clustering libraries in nltk such as k-means..etc. But I don't know how I can create a graph like above. (also I have to make some clusters based on input data)
Upvotes: 1
Views: 915
Reputation: 23322
The image you present is of a hierarchical cluster. Unlike "typical" cluster analysis, it shows not one way of clustering the data, but all the possible ways to do it, for all possible numbers of clusters. You get one "cluster set" by counting the intersections of the hierarchy with a arbitrary horizontal line in the hierarchy image.
The K-means algorithm, OTOH, depends on you providing the number of clusters you want, so you can't generate a hierarchy from it. The NLTK doesn't seem to provide tools for hierarchical cluster analysis.
You should probably familiarize yourself with the basic clustering concepts before deciding what output you want
Upvotes: 1