Ricky
Ricky

Reputation: 4686

Avoiding automatic conversion of dgCMatrix to dgeMatrix

I use the class dgCMatrix from the Matrix package to store a square matrix of about 255 million values, with a size of about 1.7MB .

However after I perform variable <- variable/rowSums(variable) where variable is the sparse matrix, the resulting variable changes into class dgeMatrix, and the size ballooned to almost 2GB, effectively taking up all memory available and in some instances crashing the script.

Is there a way to coerce the output to remain in the class dgCMatrix ?

I suspect that the reason is that the number of non-zero elements increase to the point that the matrix is no longer considered sparse, due to introduction of NaN in elements where the sum of rows is zero. If there's a work around to address the NaN 's , I'm open to that too. Note however that I cannot avoid producing the zero rows, because my matrix need to be a square, and the corresponding column sums are generally non-zero.

Upvotes: 1

Views: 1070

Answers (2)

Marco
Marco

Reputation: 1

I have the same problem. This is the work-around that I am using to avoid NaNs and to preserve the output in the class dgCMatrix:

tmp = 1/rowSums(variable)
tmp[is.infinite(tmp)] <- 0
variable <- variable * tmp

Upvotes: 0

AodhanOL
AodhanOL

Reputation: 630

You could try doing a simple ifelse function for the divisor:

variable <- variable/ifelse(rowSums(variable)!=0,rowSums(variable),1)

Unless there's some reason you need to be dividing by the 0 there, that seems like the simplest way to avoid NANs.

Upvotes: 1

Related Questions