Crusader
Crusader

Reputation: 31

Selecting nonzero values from a large sparse R matrix,

I have a very large sparse matrix in R. For specified rows, I want to get out only the nonzero values from the respective columns (typically 5-10 out of 10000). Using the View option, only a very small subset of the matrix can be visualized (exceeds memory, I guess). I get the same problem, when I use e.g. A[1, ] to get out the first row of A.. I would like to get a vector containing only the column indices and corresponding values, where the value is above zero, whenever I specify a specific row of the matrix. Is there a smart way of doing this?

Upvotes: 3

Views: 2922

Answers (1)

Ajay
Ajay

Reputation: 454

Assuming you have a sparse dgCMatrix and the user-selected row is in variable 'rowIndx', the following code will create an index of all non-zero values and then pick user-selected row of interest from that.

rowIndx <- 2
mm <- Matrix::Matrix(matrix(rbinom(2e4, 1, 0.10), ncol = 100))

Create the indices of non-zero elements

colN <- diff(mm@p) #get the number of non-zero elements in each column
indx <- cbind(mm@i+1,rep(seq_along(colN),colN)) #create the indices of all non-zero elements

Get the required column indices and values

indx[which(indx[,1]==rowIndx),2] #vector of non-zero column indices
mm[rowIndx,indx[which(indx[,1]==rowIndx),2]] #vector of non-zero values

This method is three times faster than creating indices with 'which'

indx <- which(mm!=0,arr.ind = T)

method for large dgCMatrix with 2e8 elements.

Upvotes: 2

Related Questions