Anando Haider
Anando Haider

Reputation: 43

how to select cluster with maximum frequency in k means

I have created a k means cluster from Gensim word2vec where the value of k is 3. Now I want to retrieve the cluster and the values where the frequency is the most.

import gensim
from gensim.models import Word2Vec
import nltk
from nltk.tokenize import sent_tokenize
from sklearn.cluster import KMeans
import numpy as np
text = "Thank you for keeping me updated on this issue. I'm happy to hear that the issue got resolved after all and you can now use the app in its full functionality again. Also many thanks for <pre> your suggestions. We hope to improve this feature in the future. In case you experience any <pre> further problems with the app, please don't hesitate to contact me again."
sentences = sent_tokenize(text)
word_text = [[text for text in sentences.split()] for sentences in sentences]
model = Word2Vec(word_text, min_count=1)
x = model[model.wv.vocab]
n_clusters = 3
kmeans = KMeans(n_clusters=n_clusters)
kmeans = kmeans.fit(x)

Upvotes: 0

Views: 227

Answers (1)

theletz
theletz

Reputation: 1795

You can find the labels of each data point :

labels = kmeans.labels_

Now you can find the number of samples at each cluster using:

np.unique(labels, return_counts=True)

and you can find the clusters centers using kmeans.cluster_centers_

Upvotes: 1

Related Questions