Reputation: 719
While performing clustering using R I have come across an error. I have a dataset d which is a distance matrix. Variable fit is obtained by the following
fit <- kmeans(d,k=2) # assume that number of cluster lie between 1 and nrow(x)
clusplot(d, fit$cluster, color=TRUE, shade = TRUE, lines=0)
The error that is being displayed is
Error in mkCheckX(x, diss) : x is not a data matrix
The matrix d is given by
structure(c(2, 4, 6, 2, 4, 2), Size = 4L, Diag = FALSE, Upper = FALSE,
method = "euclidean", call = dist(x = DATA, method = "euclidean"),
class = "dist")
Upvotes: 2
Views: 7535
Reputation: 4339
The clusplot
function accept its first argument to be a matrix or data frame, or a dissimilarity matrix (or a distance matrix), depending on the value of the diss
argument, which is FALSE
by default. See ?clusplot
for more information.
So, you need to use:
d = dist(DATA) # for a distance matrix or d = daisy(DATA) for a dissimilarity matrix
clusplot(d, diss=TRUE, fit$cluster, color=TRUE, shade = TRUE, lines=0)
or
clusplot(DATA, fit$cluster, color=TRUE, shade = TRUE, lines=0)
You get the error because your matrix d
is not being recognized as a matrix by the function mkCheckX
, since for R is an object of class dist
(not a matrix!). If you try is.matrix(d)
you should get FALSE
.
Also, not expect to have the same results using both methods, since when providing the data matrix the clustering is produced in a different way (basic on principal component descomposition, looking to the code).
If you check the help for dist
, you can see you can use different methods ("euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski") to calculate the distance, and you should expect different clusterings by changing the way to calculate the distance.
In summary, you distance matrix is not a matrix for R, so you got the error you saw.
Upvotes: 2