Reputation: 35
I have been attempting to fit my training set onto the KMeans Cluster and predict it onto the testing test however it hasn't been working for me trying for atleast a week now. I'm curious if maybe I'm interpreting how KMeans is used? I am told its unsupervised. Does that mean that It can not be use to predict clusters if it knows how the training data is clustered?
Thank you.
Upvotes: 2
Views: 8312
Reputation: 2812
Yes you can use k-means to predict clusters. Once you have clustered your training data, you will receive cluster centers for the chosen number of clusters. E.g., if you have chosen k=3
, your dataset will be divided into 3 clusters and hence you will receive 3 cluster centers.
Therefore, now you can take your test data and for each test data point you can find the euclidean
distance among the the three cluster centers. The one for which the distance is minimum will be the predicted cluster for you.
If you are using scikit-learn there is also a predict
method with K-Means, which should do the above basically.
Upvotes: 7
Reputation: 116
The KMeans Cluster is unsupervised ML model. That means there won't be any labelled data for training and prediction also. It takes training data and based on model tuning it tries cluster the training data and assign cluster labels for each cluster.
And on this trained model you can pass values so that it predicts the optimal cluster label for given input. Here is example python code snippet.
import numpy as np
import matplotlib.pyplot as pyplot
from sklearn.cluster import KMeans
from sklearn.preprocessing import scale
model = KMeans(n_clusters=2)
model = model.fit(scale(data)) # data is your training data
print(model.labels_) # prints labels for clusters. you can map to meaningful labels
model.predict(scale(test)) # test is your data to predict the cluster
Upvotes: 3