Reputation: 4686
I use the class dgCMatrix
from the Matrix
package to store a square matrix of about 255 million values, with a size of about 1.7MB .
However after I perform variable <- variable/rowSums(variable)
where variable
is the sparse matrix, the resulting variable changes into class dgeMatrix
, and the size ballooned to almost 2GB, effectively taking up all memory available and in some instances crashing the script.
Is there a way to coerce the output to remain in the class dgCMatrix
?
I suspect that the reason is that the number of non-zero elements increase to the point that the matrix is no longer considered sparse, due to introduction of NaN in elements where the sum of rows is zero. If there's a work around to address the NaN 's , I'm open to that too. Note however that I cannot avoid producing the zero rows, because my matrix need to be a square, and the corresponding column sums are generally non-zero.
Upvotes: 1
Views: 1070
Reputation: 1
I have the same problem. This is the work-around that I am using to avoid NaNs and to preserve the output in the class dgCMatrix
:
tmp = 1/rowSums(variable)
tmp[is.infinite(tmp)] <- 0
variable <- variable * tmp
Upvotes: 0
Reputation: 630
You could try doing a simple ifelse function for the divisor:
variable <- variable/ifelse(rowSums(variable)!=0,rowSums(variable),1)
Unless there's some reason you need to be dividing by the 0 there, that seems like the simplest way to avoid NANs.
Upvotes: 1