iamdeit
iamdeit

Reputation: 6045

How to calculate the distance between a document and each centroid (k-means)?

I executed scikit-learn k-means algorithm and got the resulting centroids. I have a new document (was not in the initial collection) and I would like to calculate the distance between every centroid and the new document to know in which cluster it should be placed.

Is there a built in function to achieve that or should I write a similarity function manually?

Upvotes: 0

Views: 845

Answers (1)

Thomas Moreau
Thomas Moreau

Reputation: 4467

You can use the method predict to get the closest cluster for each sample in a matrix X:

from sklearn.cluster import KMeans

model = KMeans(n_clusters=K)
model.fit(X_train)
label = model.predict(X_test)

Upvotes: 1

Related Questions