Reputation: 1062
I have a large data matrix (33183x1681), each row corresponding to one observation and each column corresponding to the variables.
I applied K-medoids clustering using PAM function in R, and I tried to visualize the clustering results using the built-in plots available with the PAM function. I got this error:
Error in princomp.default(x, scores = TRUE, cor = ncol(x) != 2) :
cannot use cor=TRUE with a constant variable
I think this problem is because of the high dimensionality of the data matrix I'm trying to cluster.
Any thoughts/ideas how to tackle this issue?
Upvotes: 1
Views: 7498
Reputation: 174788
Check out the clara()
function in package cluster which is shipped with all versions of R.
library("cluster")
## generate 500 objects, divided into 2 clusters.
x <- rbind(cbind(rnorm(200,0,8), rnorm(200,0,8)),
cbind(rnorm(300,50,8), rnorm(300,50,8)))
clarax <- clara(x, 2, samples=50)
clarax
> clarax
Call: clara(x = x, k = 2, samples = 50)
Medoids:
[,1] [,2]
[1,] -1.15913 0.5760027
[2,] 50.11584 50.3360426
Objective function: 10.23341
Clustering vector: int [1:500] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
Cluster sizes: 200 300
Best sample:
[1] 10 17 45 46 68 90 99 150 151 160 184 192 232 238 243 250 266 275 277
[20] 298 303 304 313 316 327 333 339 353 358 398 405 410 411 421 426 429 444 447
[39] 456 477 481 494 499 500
Available components:
[1] "sample" "medoids" "i.med" "clustering" "objective"
[6] "clusinfo" "diss" "call" "silinfo" "data"
Note that you should study the help for clara()
(?clara
) in some detail as well as the references cited in order to make the clustering performed by clara()
as close to or identical to pam()
.
Upvotes: 6