Daniel Goldfarb
Daniel Goldfarb

Reputation: 7714

reason for faster matrix allocation in R

Posting Best way to allocate matrix in R, NULL vs NA? shows that writing your own matrix allocation function in R can be 8 to 10 times faster than using R's built-in matrix() function to pre-allocate a large matrix.

Does anyone know why the hand crafted function is so much faster? What is R doing inside matrix() that is so slow? Thanks.

Here's the code on my system:

create.matrix <- function( nrow, ncol ) {
x<-matrix()
length(x) <- nrow*ncol
dim(x) <- c(nrow,ncol)
x
}

system.time( x <- matrix(nrow=10000, ncol=9999) )
user  system elapsed 
1.989   0.136   2.127 

system.time( y <- create.matrix( 10000, 9999 ) )
user  system elapsed 
0.192   0.141   0.332 
identical(x,y)
[1] TRUE

I appologize to those who commented thinking that the user-defined function was slower, since what is posted in the answer in the above link is inconsistent. I was looking at the user time, which is about 8 times faster in the above link, and on my system about 10 times faster for the user-defined vs built-in.

In response to Joshua's request for session info:

> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_2.12.1

Also, I tried running Simon's three examples, and the third example that Simon gives as the fastest, turns out for me the slowest:

> system.time(matrix(NA, nrow=10000, ncol=9999)) 
   user  system elapsed 
  2.011   0.159   2.171 
> system.time({x=NA; length(x)=99990000; dim(x)=c(10000,9999); x}) 
   user  system elapsed 
  0.194   0.137   0.330 
> system.time(matrix(logical(0), nrow=10000, ncol=9999)) 
   user  system elapsed 
  4.180   0.200   4.385 

I still think however that Simon may be on the right track with the idea that matrix() initially allocates a 1x1 matrix and then copies it. Anyone know of any good documentation on R internals? Thanks.

Upvotes: 3

Views: 1757

Answers (4)

Matt Dowle
Matt Dowle

Reputation: 59612

Not sure this is the cause (could be a different inefficiency), but in do_matrix in src/array.c there is a type switch containing :

case LGLSXP :
    for (i = 0; i < nr; i++)
    for (j = 0; j < nc; j++)
        LOGICAL(ans)[i + j * NR] = NA_LOGICAL;

That looks to be page inefficient. Think it should be :

case LGLSXP :
    for (j = 0; j < nc; j++)
    for (i = 0; i < nr; i++)
        LOGICAL(ans)[i + j * NR] = NA_LOGICAL;

or more simply :

case LGLSXP :
    for (i = 0; i < nc*nr; i++)
        LOGICAL(ans)[i] = NA_LOGICAL;

( with some fine tuning required since NR is type R_xlen_t whilst i, nc and nr are type int ).


UPDATE:

After posting to r-devel :

Possible page inefficiency in do_matrix in array.c

Simon Urbanek has now committed a change to R. It now uses the single index approach above :

Latest live version of array.c

But as Simon says, and I covered myself for above, this doesn't appear to fix the particular issue raised by the question. A second, different, inefficiency needs to be found and fixed, too.


Here is perhaps what that follow up fix could be. This combines the page efficiency of the new code (now in R), but, switches to use that when matrix(data=NA) (R's default). This avoids the modulo in copyMatrix that Simon mentions in his answer, by avoiding copyMatrix in the NA case.

Currently, do_matrix in array.c has :

if(lendat) {
    if (isVector(vals))
        copyMatrix(ans, vals, byrow);
    else
        copyListMatrix(ans, vals, byrow);
} else if (isVector(vals)) { 
    // fill with NAs in the new page efficient way that Simon already committed.
}

which could be as follows. I'm not aware of an ISNA() function at C level that takes SEXP input, so have coded that long hand (Simon, is there a better way?) :

if(lendat && // but not a single NA, basically :
             !(lendat==1 &&
                  ((isLogical(vals) && LOGICAL(vals)[0] == NA_LOGICAL) ||
                   (isReal(vals) && ISNA(REAL(vals)[0])) ||
                   (isInteger(vals) && INTEGER(vals)[0] == NA_INTEGER)))) {
    if (isVector(vals))
        copyMatrix(ans, vals, byrow);
    else
        copyListMatrix(ans, vals, byrow);
} else if (isVector(vals)) { 
    // fill with NAs in the new page efficient way that Simon already committed.
    // this branch will now run when dat is a single NA, too
}

Upvotes: 4

Simon Urbanek
Simon Urbanek

Reputation: 13932

The problem is that your matrix call is a bit more complicated than you think it is. Compare the following versions:

# copy NA matrix
> system.time(matrix(NA, nrow=10000, ncol=9999))
   user  system elapsed 
  1.272   0.224   1.496 

# replicate NA vector (faster version of what you used)
> system.time({x=NA; length(x)=99990000; dim(x)=c(10000,9999); x})
   user  system elapsed 
  0.292   0.260   0.552 

# fastest - just allocate a matrix filled with NAs 
> system.time(matrix(logical(0), nrow=10000, ncol=9999))
   user  system elapsed 
  0.184   0.308   0.495 

So in your example you were essentially creating a 1 x 1 NA matrix which was replicated to the size you specified - the slowest approach. Doing the same with a vector is faster (because it doesn't need to use modulo for columns) - you did it in a bit complicated way (by creating a matrix, converting it into a vector and then back to a matrix), but the idea is the same. Finally, if you just use an empty vector, so the matrix will be simply filled with NAs what you wanted anyway and thus no extra work is needed (fastest).

EDIT An important note, though: Matthew's suggestion was correct although it was not involved (since the code he cited is the logical(0) case, not in the NA case). Inadvertently I was running R-devel in the above timings so timings in released R will vary.

Upvotes: 8

Tommy
Tommy

Reputation: 40851

Hmm. Yeah, it's weird. ...and this is slightly faster still - even though it's more like matrix() in that it allows for a single data argument (but it must be scalar):

create.matrix2 <- function(data=NA, nrow, ncol) {
  x <- rep.int(data[[1]], nrow*ncol)
  dim(x) <- c(nrow, ncol)
  x
}
system.time( x <- matrix(nrow=10000, ncol=9999) ) # 0.387 secs
system.time( y <- create.matrix(nrow=10000, ncol=9999) )  # 0.199 secs
system.time( z <- create.matrix2(nrow=10000, ncol=9999) ) # 0.173 secs
identical(x,z) # TRUE

...I guess the internal code for creating the matrix does something wasteful (or possibly useful, but I can't think of what that would be)...

Oh, since it handles data of any length, it might end up with something similar to rep(data, length.out=nrow*ncol) which is rather slow:

system.time( rep(NA, length.out=10000*9999) ) # 1.5 secs!

Anyway, definitely room for improvement!

Upvotes: 1

IRTFM
IRTFM

Reputation: 263411

I'm going to take issue with the comments, although I do understand most of them. The problem is that the referenced posting has an answer that has internal contradictions that the commenters have been relying on without checking. The timing for user and system do not add up correctly to elapsed as they should.

 create.matrix <- function(size) {
  x <- matrix()
  length(x) <- size^2
  dim(x) <- c(size,size)
  x
  }
  system.time(x <- matrix(data=NA,nrow=10000,ncol=10000))
#   user  system elapsed 
#  0.464   0.226   0.688 
 system.time(y <- create.matrix(size=10000))
#   user  system elapsed 
#  0.177   0.239   0.414 

I suspect that the efficiency is actually being achieved by the fact that the user-defined function is only capable of creating a square matrix and 'matrix' needs to do checks on validity of arguments for a more general situation.

EDIT: I see you have disproven one of my hypotheses (regarding the square matrix limitation) and I will also note that my other hypothesis that this was somehow due to lazy evaluation also failed my tests. The discrepancy really does not make sense because the user-code uses the matrix function.

Upvotes: 5

Related Questions