Reputation: 266
I have performed k-mode clustering on categorical variables for historical data. I did clustering because I wanted to see what clusters the data falls into. Now that I have the output, if and when a new data comes in, is there any way where I can predict the cluster that it will fall into.
One way might be, since I have the data for each row and the cluster that it falls into I can use it as train data and do a supervised learning. But I want to know whether any possible method exists where I will be able to use the existing output variable to predict (sort of semi supervised learning)
I may not be able to share any data or output since I am working for a client, but any direction on how to approach will be highly helpful. I have been researching about it for quite sometime now but couldn't find a suitable solution.
Upvotes: 1
Views: 1333
Reputation: 77464
Most clustering algorithms cannot predict for new data.
KMeans and GMM are exceptions, and k-modes should work like k-means (find the most similar mode).
But usually, when you use clustering, you really should analyze the clusters and double-check this, as clusterings just don't get 100% right. Usually, you'll want some clusters from run A, some from run B etc. Whatever makes sense. Then train a classifier on the reviewed, cleaned up clusters for prediction.
Upvotes: 2