Aman Mathur
Aman Mathur

Reputation: 719

Error during clusplot in R

While performing clustering using R I have come across an error. I have a dataset d which is a distance matrix. Variable fit is obtained by the following

fit <- kmeans(d,k=2) # assume that number of cluster lie between 1 and nrow(x)
clusplot(d, fit$cluster, color=TRUE, shade = TRUE, lines=0)

The error that is being displayed is

Error in mkCheckX(x, diss) : x is not a data matrix

The matrix d is given by

structure(c(2, 4, 6, 2, 4, 2), Size = 4L, Diag = FALSE, Upper = FALSE,
          method = "euclidean", call = dist(x = DATA, method = "euclidean"),
          class = "dist")

Upvotes: 2

Views: 7535

Answers (1)

Ricardo Oliveros-Ramos
Ricardo Oliveros-Ramos

Reputation: 4339

The clusplot function accept its first argument to be a matrix or data frame, or a dissimilarity matrix (or a distance matrix), depending on the value of the diss argument, which is FALSE by default. See ?clusplot for more information.

So, you need to use:

d = dist(DATA) # for a distance matrix or d = daisy(DATA) for a dissimilarity matrix
clusplot(d, diss=TRUE, fit$cluster, color=TRUE, shade = TRUE, lines=0)

or

clusplot(DATA, fit$cluster, color=TRUE, shade = TRUE, lines=0)

You get the error because your matrix d is not being recognized as a matrix by the function mkCheckX, since for R is an object of class dist (not a matrix!). If you try is.matrix(d) you should get FALSE.

Also, not expect to have the same results using both methods, since when providing the data matrix the clustering is produced in a different way (basic on principal component descomposition, looking to the code).

If you check the help for dist, you can see you can use different methods ("euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski") to calculate the distance, and you should expect different clusterings by changing the way to calculate the distance.

In summary, you distance matrix is not a matrix for R, so you got the error you saw.

Upvotes: 2

Related Questions