Mark Heckmann
Mark Heckmann

Reputation: 11431

clustering using non-negative matrix factorization (IntNMF): what to do about 'a dimension is zero' error

Using the IntNMF package I want to find clusters in a dataset. My data is a sparse matrix (80-90% zeros) with subjects in the rows and features in the columns. For some reason, I get an error and I cannot figure out why or what to do about it.

library(IntNMF)

set.seed(4)
n <- 10
p <- 30
m <- matrix(sample(0:3, rep=T, size = n*p, 
                   prob = c(5,1,1,1)), ncol=p)
any(rowSums(m) == 0)  # no zero rows  
any(colSums(m) == 0)  # no zero columns
rankMatrix(m) == n    # full row rank

# finding the optimal number of clusters
opt.k <- nmf.opt.k(dat=m, n.runs=5, n.fold=2, k.range=2:4, 
                   result=TRUE, make.plot=TRUE, 
                   progress=TRUE)

The error I get is:

error in svd(X) : a dimension is zero

I assumed that sparsity is no problem, maybe it is. I am not very familiar with NMF or the IntNMF package yet, so any hints are appreciated.

Upvotes: 1

Views: 1996

Answers (1)

Keith Hughitt
Keith Hughitt

Reputation: 4960

Not sure what the issue is with IntNMF, but it also fails for other seeds, as well as when using the default arguments for nmf.opt.k.

I would recommend checking out the NMF library in the meantime instead.

I tested it out with your test matrix and it worked fine:

> nmf(m, rank=2)
<Object of class: NMFfit>
 # Model:
  <Object of class:NMFstd>
  features: 10 
  basis/rank: 2 
  samples: 30 
 # Details:
  algorithm:  brunet 
  seed:  random 
  RNG: 403L, 20L, ..., 961813654L [05ac8381a0361b9c9d54208dfe6a12cb]
  distance metric:  'KL' 
  residuals:  162.3778 
  Iterations: 480 
  Timing:
     user  system elapsed 
    0.047   0.000   0.046 

There is also a section of estimating the rank of the factorization (selecting a k) in the the vignette for NMF (section 2.6).

Upvotes: 1

Related Questions