Reputation: 115
I'm getting the following error when calling NbClust()
:
Error in NbClust(data = ds[, sapply(ds, is.numeric)], diss = NULL, distance = "euclidean", : The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.
I've called ds <- ds[complete.cases(ds),]
just before running NbClust so there's no missing values.
Any idea what's behind this error?
Thanks
Upvotes: 1
Views: 3411
Reputation: 128
I had same issue in my research. So, I had mailed to Nadia Ghazzali, who is the package maintainer, and got an answer. I'll attached my mail and her reply.
my e-mail:
Dear Nadia Ghazzali. Hello Nadia. I have some questions about NbClust function in R library. I have tried googling but could not find satisfying answers. First, I’m so grateful for you to making this awsome R library. It is very helpful for my reasearch. I tested NbClust function in NbClust library with my own data like below.
> clust <- NbClust(data, distance = “euclidean”, min.nc = 2, max.nc = 10, method = ‘kmeans’, index =”all”)
But soon, an error has occurred. Error: division by zero! Error in Indices.WBT(x = jeu, cl = cl1, P = TT, s = ss, vv = vv) : object 'scott' not found So, I tried NbClust function line by line and found that some indices, like CCC, Scott, marriot, tracecovw, tracew, friedman, and rubin, were not calculated because of object vv = 0. I’m not very familiar with argebra so I don’t know meaning of eigen value. But it seems to me that object ss(which is squart of eigenValues) should not be 0 after prodected. So, here is my questions. I assume that my data is so sparse(a lot of zero values) that sqrt(eigenValues) becomes too small, is that right? I’m sorry I can’t attach my data but I can attach some part of eigenValues and squarted eigenValues.
> head(eigenValues) [1] 0.039769880 0.017179826 0.007011972 0.005698736 0.005164871 0.004567238 > head(sqrt(eigenValues)) [1] 0.19942387 0.13107184 0.08373752 0.07548997 0.07186704 0.06758134
And if my assume is right, what can I do for this problems? Only one way to drop out 7 indices? Thank you for reading and I’ll waiting your reply. Best regards!
and her reply:
Dear Hansol,
Thank you for your interest. Yes, your understanding is good. Unfortunately, the seven indices could not be applied.
Best regards,
Nadia Ghazzali
Upvotes: 6
Reputation: 2022
@seni The cause of this error is data related. If you look at the source code of this function,
NbClust <- function(data, diss="NULL", distance = "euclidean", min.nc=2, max.nc=15, method = "ward", index = "all", alphaBeale = 0.1)
{
x<-0
min_nc <- min.nc
max_nc <- max.nc
jeu1 <- as.matrix(data)
numberObsBefore <- dim(jeu1)[1]
jeu <- na.omit(jeu1) # returns the object with incomplete cases removed
nn <- numberObsAfter <- dim(jeu)[1]
pp <- dim(jeu)[2]
TT <- t(jeu)%*%jeu
sizeEigenTT <- length(eigen(TT)$value)
eigenValues <- eigen(TT/(nn-1))$value
for (i in 1:sizeEigenTT)
{
if (eigenValues[i] < 0) {
print(paste("There are only", numberObsAfter,"nonmissing observations out of a possible", numberObsBefore ,"observations."))
stop("The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.")
}
}
And I think the root cause of this error is the negative eigenvalues that seep in when the number of clusters is very high, i.e. the max.nc
is high. So to solve the problem, you must look at your data. See if it got more columns then rows. Remove missing values, check for issues like collinearity & multicollinearity, variance, covariance etc.
For the other error, invalid clustering method
, look at the source code of the method here. Look at line number 168, 169
in the given link. You are getting this error message because the clustering method is empty. if (is.na(method))
stop("invalid clustering method")
Upvotes: 0