Soumajit
Soumajit

Reputation: 376

k means plot with the original data in r

Currently I am exploring kmeans function. I have a simple text file (test.txt) with the following entries. The data can be split into 2 clusters.

    1 
    2
    3
    8
    9
   10

How to plot the results of kmeans function ( using plot function ) along with the original data? I am also interested in observing how the clusters are distributed along with their centroids?

Upvotes: 1

Views: 8980

Answers (1)

Mehdi Nellen
Mehdi Nellen

Reputation: 8994

This is the example from example(kmeans):

# This is just to generate example data
test <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(test) <- c("V1", "V2")

#store the kmeans in a variable called cl
(cl <- kmeans(test, 2))

# plot it and also plot the points of the centeroids
plot(test, col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex = 2)

Edit

OP has some additional questions:

(cl <- kmeans(test, 2))
plot(test, col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex = 2)

The above code results in: case1

(cl <- kmeans(test[,1], 2))
plot(test[,1], col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex = 2)

The above code results in: case2

(cl <- kmeans(test[,1], 2))
plot(cbind(0,test[,1]),  col = cl$cluster)
points(cbind(0,cl$centers), col = 1:2, pch = 8, cex = 2)

The above code results in: case3

explained

In case 1 the data has two dimensions (V1, V2), so the centroids have two coordinates just as very point in the plot. In case 2 the data is one dimensional (V1) just like your data. R gives every point an index, and this results in x values being index values, the centroids also have only one coordinate thats why you see them all the way to the left of the plot. case 3 is what one dimensional data actually looks like if you plot it only in one dimension.

conclusion

Your data is one dimensional, if you plot it in two dimensions you get something like case two where x values are given by R, which are index values. Plotting it like that doesn't make much sense.

Upvotes: 3

Related Questions