bytebiscuit
bytebiscuit

Reputation: 3496

Display individual kmeans clusters from the clustering vector using wordcloud in R

I've created a k-means cluster in R from a document-term matrix. The Clustering vector is as follows:

 doc1.txt doc10.txt doc11.txt doc12.txt doc13.txt doc14.txt doc15.txt 
        3         3         3         3         1         3         3 
doc16.txt doc17.txt doc18.txt doc19.txt  doc2.txt doc20.txt doc21.txt 
        3         3         3         2         3         3         3 
doc22.txt doc23.txt doc24.txt doc25.txt doc26.txt doc27.txt doc28.txt 
        3         3         3         3         3         3         3 
doc29.txt  doc3.txt doc30.txt  doc4.txt  doc5.txt  doc6.txt  doc7.txt 
        3         3         3         1         1         1         3 
 doc8.txt  doc9.txt 
        3         3  

the document-term matrix is as follows:

     term1  term2  term3  term4  term4 
doc1   5      3     2      1      4
doc2   3      4     12     11     21
doc3   2      3     4      12     16
doc4   1      3     0      10     15
doc5   4      10    0      20     4
  .  
  .
  .

My question is how can I access the data of all documents in say cluster 3 and return a matrix of that! I'm trying to plot the frequencies of terms min.freq = 3 from all documents in cluster 3 using a wordcloud.

Many thanks

Upvotes: 0

Views: 1200

Answers (1)

Ryan Walker
Ryan Walker

Reputation: 3286

If your cluster label vector is called clusters, you can use

docs3 <- gsub(".txt","",names(which(clusters==3)))

If your term document matrix is called DTM, you can get the submatrix for documents in cluster 3 by

DTM3 <- DTM[docs3,]

Upvotes: 3

Related Questions