Reputation: 11431
Using the IntNMF
package I want to find clusters in a dataset.
My data is a sparse matrix (80-90% zeros) with subjects in the rows and features in the columns. For some reason, I get an error and I cannot figure out why or what to do about it.
library(IntNMF)
set.seed(4)
n <- 10
p <- 30
m <- matrix(sample(0:3, rep=T, size = n*p,
prob = c(5,1,1,1)), ncol=p)
any(rowSums(m) == 0) # no zero rows
any(colSums(m) == 0) # no zero columns
rankMatrix(m) == n # full row rank
# finding the optimal number of clusters
opt.k <- nmf.opt.k(dat=m, n.runs=5, n.fold=2, k.range=2:4,
result=TRUE, make.plot=TRUE,
progress=TRUE)
The error I get is:
error in svd(X) : a dimension is zero
I assumed that sparsity is no problem, maybe it is. I am not very familiar with NMF or the IntNMF
package yet, so any hints are appreciated.
Upvotes: 1
Views: 1996
Reputation: 4960
Not sure what the issue is with IntNMF
, but it also fails for other seeds, as well as when using the default arguments for nmf.opt.k
.
I would recommend checking out the NMF library in the meantime instead.
I tested it out with your test matrix and it worked fine:
> nmf(m, rank=2)
<Object of class: NMFfit>
# Model:
<Object of class:NMFstd>
features: 10
basis/rank: 2
samples: 30
# Details:
algorithm: brunet
seed: random
RNG: 403L, 20L, ..., 961813654L [05ac8381a0361b9c9d54208dfe6a12cb]
distance metric: 'KL'
residuals: 162.3778
Iterations: 480
Timing:
user system elapsed
0.047 0.000 0.046
There is also a section of estimating the rank of the factorization (selecting a k
) in the the vignette for NMF (section 2.6).
Upvotes: 1