Reputation: 183
I am trying to cluster terms present in text documents using spectral clustering. After doing clustering I would like to get the terms present in each cluster.
The code I tried is as follows,
true_k = 4
vectorizer = TfidfVectorizer(stop_words='english',decode_error='ignore')
X = vectorizer.fit_transform(documents)
terms = vectorizer.get_feature_names()
model = SpectralClustering(n_clusters=true_k ,eigen_solver='arpack',affinity='nearest_neighbors')
model.fit(X)
From here I am stuck to get the terms per cluster, using 'labels_' doesn't help as it only returns the cluster labels
Edit : Solved , The code below did the trick,
print("Terms per cluster:")
for i in range(true_k):
print "Cluster %d:" % i,
T=X[model.labels_==i].indices
for ind in T:
print terms[ind]
print
Upvotes: 2
Views: 808
Reputation: 4039
If I understand you correctly, you must first fit the model, i.e. model.fit(X)
. To access elements of X
belonging to cluster k
according to the fitted model, do X[model.labels_==k]
.
Upvotes: 1