GoodGJ
GoodGJ

Reputation: 339

Clustering words by using numpy and nltk or CLUTO in Python programming

I am trying to clustering some words.
Some part of my data is as below (it's just example).

    cat dog horse ostrich 
cat  8   2.3  3.4  4.7
dog  7   8    3   2.4
horse 3.4 2.5 8  1.5
ostrich 3.4 3.2 4.4 8

The bigger number means that the similarity between two words is higher. Based on this kind of format data, I want to make a clusters (for example, (cat, dog), (horse), (ostrich) totally 3 clusters).

At first, I tried to use CLUTO... to make some clusters and a (very beautiful) graph as below. enter image description here

But I can't... I already saw the manuals but it's not that easy to understand. So, I tried to use some clustering libraries in nltk such as k-means..etc. But I don't know how I can create a graph like above. (also I have to make some clusters based on input data)

Upvotes: 1

Views: 915

Answers (1)

loopbackbee
loopbackbee

Reputation: 23322

The image you present is of a hierarchical cluster. Unlike "typical" cluster analysis, it shows not one way of clustering the data, but all the possible ways to do it, for all possible numbers of clusters. You get one "cluster set" by counting the intersections of the hierarchy with a arbitrary horizontal line in the hierarchy image.

The K-means algorithm, OTOH, depends on you providing the number of clusters you want, so you can't generate a hierarchy from it. The NLTK doesn't seem to provide tools for hierarchical cluster analysis.

You should probably familiarize yourself with the basic clustering concepts before deciding what output you want

Upvotes: 1

Related Questions