Reputation: 13
So I'm working on a project where I am using embeddings generated form Universal Sentence Encoder and giving them as input to kmeans clustering present in sklearn.cluster.
The problem is that I ran this for only a few documents at first. Now I want to run this one more documents. But when i tried running it with other documents, the clustering labels vary from the original. This happens every time I change the dataset. I want to keep the labels similar to the original outcome as that had the best results as it had the best input documents.
Is there any way that I can define before hand which label should be assigned to which type of input?
Like is there a way that I can set the kmeans label to be 3 for one type of embedding?
Upvotes: 1
Views: 27