Shashank Shandilya
Shashank Shandilya

Reputation: 13

Set sample points for each cluster in kmeans using Python

So I'm working on a project where I am using embeddings generated form Universal Sentence Encoder and giving them as input to kmeans clustering present in sklearn.cluster.

The problem is that I ran this for only a few documents at first. Now I want to run this one more documents. But when i tried running it with other documents, the clustering labels vary from the original. This happens every time I change the dataset. I want to keep the labels similar to the original outcome as that had the best results as it had the best input documents.

Is there any way that I can define before hand which label should be assigned to which type of input?

Like is there a way that I can set the kmeans label to be 3 for one type of embedding?

Upvotes: 1

Views: 27

Answers (0)

Related Questions