Adnan Hussein
Adnan Hussein

Reputation: 15

How is to implement K-Means clustering algorithm with Cosine Distance measure?

I need to run K-means clustering algorithm to cluster textual data but by using cosine distance measure instead of Euclidean distance. Any reliable implementation of this in python?

Edit:

I have tried to use NLTK as following:

NUM_CLUSTERS=3
kclusterer = KMeansClusterer(NUM_CLUSTERS, distance= 
            nltk.cluster.util.cosine_distance, repeats=25)

clstr = kclusterer.cluster(X, clusters=False, trace=False)
print (clstr)

But it gives me error:

TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]

X here is a TF-IDF matrix of shape (15, 155).

Upvotes: 0

Views: 1498

Answers (2)

MaKaNu
MaKaNu

Reputation: 1008

If you want to do it yourself: https://stanford.edu/~cpiech/cs221/handouts/kmeans.html

just change the distance measruing entry. The distance measuring is in the for loop over i of the pseudo code.

Upvotes: 1

joostblack
joostblack

Reputation: 2525

You can use NLTK for this. The K-means from NLTK allows you to specify which measure of distance you want to use.

nltk

Upvotes: 0

Related Questions