Reputation: 123
I have used some R code I found online to make a K-Means cluster plot, as follows:
dtmr <-DocumentTermMatrix(docs,control=list(wordLengths=c(4,15), bounds = list(global = c(50,500))))
## do tfxidf
dtm_tfxidf <- weightTfIdf(dtmr)
### k-means (this uses euclidean distance)
m <- as.matrix(dtm_tfxidf)
rownames(m) <- 1:nrow(m)
### don't forget to normalize the vectors so Euclidean makes sense
norm_eucl <- function(m) m/apply(m, MARGIN=1, FUN=function(x) sum(x^2)^.5)
m_norm <- norm_eucl(m)
### cluster into 5 clusters
cl <- kmeans(m_norm, 5)
table(cl$cluster)
### show clusters using the first 2 principal components
plot(prcomp(m_norm)$x, col=cl$cl, text(m_norm, mpg, row.names(m)))
This does give me plot of the 5 clusters, I amjust wondering how can I add labels to show what each dot is?
And on a side note, is there anyway that I can see what these clusters are? The table(cl$cluster)
line just prints five numbers, I do not know what these numbers mean, my data that I am using is just over 400 text documents.
Upvotes: 1
Views: 4292
Reputation: 251
The problems I can see are that the text()
is inside the plot
call when it should come after and that the x
and y
passed to text
are not the same used to generate the plot, the result of prcomp
.
I'm using mtcars
as a dataset:
df<- mtcars
### k-means (this uses euclidean distance)
m <- as.matrix(df)
rownames(m) <- 1:nrow(m)
### don't forget to normalize the vectors so Euclidean makes sense
norm_eucl <- function(m) m/apply(m, MARGIN=1, FUN=function(x) sum(x^2)^.5)
m_norm <- norm_eucl(m)
### cluster into 5 clusters
cl <- kmeans(m_norm, 5)
table(cl$cluster)
### show clusters using the first 2 principal components
# do the PCA outside the plot function for now
PCA <-prcomp(m_norm)$x
#plot then add labels
plot(PCA, col=cl$cl)
text(x=PCA[,1], y=PCA[,2], cex=0.6, pos=4, labels=(row.names(m)))
For the second question, the cluster assignments are in cl$cluster
. The table()
call just counts how many members of each cluster there are, which is why it's reporting five numbers for you.
Upvotes: 2