iskandarblue
iskandarblue

Reputation: 7526

Unexpected clustering errors (partitioning around mediods)

I am using the fpc package for determining the optimal number of clusters. The pamk() function takes a dissimilarity matrix as an argument and does not require the user to specify k. According to the documentation:

pamk() This calls pam and clara for the partitioning around medoids clustering method (Kaufman and Rouseeuw, 1990) and includes two different ways of estimating the number of clusters.

but when I input two very similar matricies - foo and bar (data below), the function errors out on the second matrix (bar)

Error in pam(sdata, k, diss = diss, ...) : 
  Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2 

What could be causing this error, given that the input matricies are basically the same? For example:

foo works!

hc <- hclust(as.dist(foo))
plot(hc)
pamk.best <- fpc::pamk(foo)
pamk.best$nc
[1] 2

enter image description here

bar does not

hc <- hclust(as.dist(bar))
plot(hc, main = 'bar dendogram')
pamk.best <- fpc::pamk(bar)
Error in pam(sdata, k, diss = diss, ...) : 
  Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2

enter image description here

Any suggestions would be helpful!

dput(foo)
structure(c(0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9, 
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9, 
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0), .Dim = c(14L, 14L), .Dimnames = list(
    c("etc", "etc", "etc", "etc", "etc", "etc", "etc", "similares", 
    "etc", "etc", "etc", "etc", "etc", "similares"), NULL))

dput(bar)
structure(c(0, 6, 6, 6, 6, 6, 0, 0, 0, 0, 6, 0, 0, 0, 0, 6, 0, 
0, 0, 0, 6, 0, 0, 0, 0), .Dim = c(5L, 5L), .Dimnames = list(c("ramírez", 
"similares", "similares", "similares", "similares"), NULL))

Upvotes: 0

Views: 207

Answers (1)

user12728748
user12728748

Reputation: 8506

bar has n=5 columns, so the max(krange) has to be <= n-1, thus 4. The default krange is 2:10, hence the error. You may have to pass an appropriate krange; try:

pamk.best <- fpc::pamk(bar, krange=c(2:(dim(bar)[2]-1)))

Upvotes: 1

Related Questions