sand
sand

Reputation: 183

Scikit learn spectral clustering get items per cluster

I am trying to cluster terms present in text documents using spectral clustering. After doing clustering I would like to get the terms present in each cluster.

The code I tried is as follows,

    true_k = 4
    vectorizer = TfidfVectorizer(stop_words='english',decode_error='ignore')
    X = vectorizer.fit_transform(documents)
    terms = vectorizer.get_feature_names()
    model = SpectralClustering(n_clusters=true_k ,eigen_solver='arpack',affinity='nearest_neighbors')
    model.fit(X)

From here I am stuck to get the terms per cluster, using 'labels_' doesn't help as it only returns the cluster labels

Edit : Solved , The code below did the trick,

    print("Terms per cluster:")
    for i in range(true_k):
        print "Cluster %d:" % i,
        T=X[model.labels_==i].indices
        for ind in T:
            print terms[ind]
        print

Upvotes: 2

Views: 808

Answers (1)

Matt Hancock
Matt Hancock

Reputation: 4039

If I understand you correctly, you must first fit the model, i.e. model.fit(X). To access elements of X belonging to cluster k according to the fitted model, do X[model.labels_==k].

Upvotes: 1

Related Questions