Reputation: 6045
I executed scikit-learn k-means algorithm and got the resulting centroids. I have a new document (was not in the initial collection) and I would like to calculate the distance between every centroid and the new document to know in which cluster it should be placed.
Is there a built in function to achieve that or should I write a similarity function manually?
Upvotes: 0
Views: 845
Reputation: 4467
You can use the method predict
to get the closest cluster for each sample in a matrix X
:
from sklearn.cluster import KMeans
model = KMeans(n_clusters=K)
model.fit(X_train)
label = model.predict(X_test)
Upvotes: 1