vinash85
vinash85

Reputation: 431

Recoding of huge matrix in R

I have a huge matrix with with values 1, 2 or 3 ( and some NA). If the matrix is n x m then I have to recode to n x 3m with each value of orginal matrix correspond to 3 entries of new matrix. If value is x in old matrix then xth entry will be 1 and other two would be zeros (if NA all of them zero).

1, 3,  NA, 1

is recoded to

1 0 0 0 0 1 0 0 0 1 0 0

I.e.

1  = 1 0 0  
3  = 0 0 1
NA = 0 0 0
1  = 1 0 0 

I have to do this efficiently in R because matrix is huge. What is most efficient way to do this? The matrix is in a data.table.

Upvotes: 2

Views: 237

Answers (1)

thelatemail
thelatemail

Reputation: 93813

With a pre-allocated empty matrix.

mat <- matrix(c(1,3,NA,1,1,3,NA,1),nrow=2,byrow=TRUE)
mat

#     [,1] [,2] [,3] [,4]
#[1,]    1    3   NA    1
#[2,]    1    3   NA    1

newmat <- matrix(0, ncol=ncol(mat)*3, nrow=nrow(mat))
ind <- cbind(rep(1:nrow(mat),ncol(mat)), as.vector(mat + (col(mat)*3-3))) 
newmat[ind] <- 1

newmat
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#[1,]    1    0    0    0    0    1    0    0    0     1     0     0
#[2,]    1    0    0    0    0    1    0    0    0     1     0     0

You can also use this method with a sparse matrix from the Matrix package.

library(Matrix)
newmat <- Matrix(0, ncol=ncol(mat)*3, nrow=nrow(mat),sparse=TRUE)
newmat[ind[complete.cases(ind),]] <- 1

newmat 
#2 x 12 sparse Matrix of class "dgCMatrix"
#                            
#[1,] 1 . . . . 1 . . . 1 . .
#[2,] 1 . . . . 1 . . . 1 . .

Using a sparse matrix has a number of advantages including significantly reduced memory use.

Upvotes: 3

Related Questions