SimBea
SimBea

Reputation: 45

R: Convert obscure table into matrix

I have table that looks like this:

Row Col Value
1   1   31
1   2   56
1   8   13
2   1   83
2   2   51
2   9   16
3   2   53

I need to convert this table into matrix (Row column represents rows and Col column represents columns). For the output like this:

   1  2  3  4  5  6  7  8  9 
1 31 56 NA NA NA NA NA 13 NA
2 81 51 NA NA NA NA NA NA 16
3 NA 53 NA NA NA NA NA NA NA

I believe that there is quick way to do what I want as my solution would be looping for every row/column combination and cbind everything.

Reproducible example:

require(data.table)
myTable <- data.table(
           Row = c(1,1,1,2,2,2,3),
           Col = c(1,2,8,1,2,9,1),
           Value = c(31,56,13,83,51,16,53))

Upvotes: 3

Views: 132

Answers (4)

Frank
Frank

Reputation: 66819

Sparse matrix. You probably want a sparse matrix

require(Matrix) # doesn't require installation
mySmat <- with(myTable,sparseMatrix(Row,Col,x=Value))

which gives

3 x 9 sparse Matrix of class "dgCMatrix"

[1,] 31 56 . . . . . 13  .
[2,] 83 51 . . . . .  . 16
[3,] 53  . . . . . .  .  .

Matrix. If you really need a matrix-class object with NAs, there's

myMat <- as.matrix(mySmat)
myMat[myMat==0] <- NA

which gives

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]   31   56   NA   NA   NA   NA   NA   13   NA
[2,]   83   51   NA   NA   NA   NA   NA   NA   16
[3,]   53   NA   NA   NA   NA   NA   NA   NA   NA

Efficiency considerations. For shorter code:

myMat <- with(myTable,as.matrix(sparseMatrix(Row,Col,x=Value)))
myMat[myMat==0] <- NA

For faster speed (but slower than creating a sparse matrix), initialize to NA and then fill, as @jimmyb and @bgoldst do:

myMat <- with(myTable,matrix(,max(Row),max(Col)))
myMat[cbind(myTable$Row,myTable$Col)] <- myTable$Value

This workaround is only necessary if you insist on NAs over zeros. A sparse matrix is almost certainly what you should use. Creating and working with it should be faster; and storing it should be less memory-intensive.

Upvotes: 4

bgoldst
bgoldst

Reputation: 35314

I believe the most concise and performant way to achieve this is to preallocate the matrix with NAs, and then assign a vector slice by manually computing the linear indexes from Row and Col:

df <- data.frame(Row=c(1,1,1,2,2,2,3), Col=c(1,2,8,1,2,9,2), Value=c(31,56,13,83,51,16,53) );
m <- matrix(NA,max(df$Row),max(df$Col));
m[(df$Col-1)*nrow(m)+df$Row] <- df$Value;
m;
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## [1,]   31   56   NA   NA   NA   NA   NA   13   NA
## [2,]   83   51   NA   NA   NA   NA   NA   NA   16
## [3,]   NA   53   NA   NA   NA   NA   NA   NA   NA

Upvotes: 2

jimmyb
jimmyb

Reputation: 4387

Straightforward:

dat <- data.frame(
         Row = c(1,1,1,2,2,2,3),
       Col = c(1,2,8,1,2,9,1),
       Value = c(31,56,13,83,51,16,53))
m = matrix(NA, nrow = max(dat$Row), ncol = max(dat$Col))
m[cbind(dat$Row, dat$Col)] = dat$Value
m

Upvotes: 4

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

xtabs in base R is perfect for this if you can live with "0" where you have NA.

This would be the basic approach:

xtabs(Value ~ Row + Col, myTable)
#    Col
# Row  1  2  8  9
#   1 31 56 13  0
#   2 83 51  0 16
#   3 53  0  0  0

However, that doesn't fill in the gaps, because not all factor levels are available. You can do this separately, or on-the-fly, like this:

xtabs(Value ~ factor(Row, sequence(max(Row))) + 
      factor(Col, sequence(max(Col))), myTable)
#                                factor(Col, sequence(max(Col)))
# factor(Row, sequence(max(Row)))  1  2  3  4  5  6  7  8  9
#                               1 31 56  0  0  0  0  0 13  0
#                               2 83 51  0  0  0  0  0  0 16
#                               3 53  0  0  0  0  0  0  0  0

By extension, this means that if the "Row" and "Col" values are factors, dcast.data.table should also work:

dcast.data.table(myTable, Row ~ Col, value.var = "Value", drop = FALSE)

(But it doesn't in my test for some reason. I had to do library(reshape2); dcast(myTable, Row ~ Col, value.var = "Value", drop = FALSE) to get it to work, thus not taking advantage of "data.table" speed.)

Upvotes: 1

Related Questions