Reputation: 7714
Posting Best way to allocate matrix in R, NULL vs NA? shows that writing your own matrix allocation function in R can be 8 to 10 times faster than using R's built-in matrix() function to pre-allocate a large matrix.
Does anyone know why the hand crafted function is so much faster? What is R doing inside matrix() that is so slow? Thanks.
Here's the code on my system:
create.matrix <- function( nrow, ncol ) {
x<-matrix()
length(x) <- nrow*ncol
dim(x) <- c(nrow,ncol)
x
}
system.time( x <- matrix(nrow=10000, ncol=9999) )
user system elapsed
1.989 0.136 2.127
system.time( y <- create.matrix( 10000, 9999 ) )
user system elapsed
0.192 0.141 0.332
identical(x,y)
[1] TRUE
I appologize to those who commented thinking that the user-defined function was slower, since what is posted in the answer in the above link is inconsistent. I was looking at the user time, which is about 8 times faster in the above link, and on my system about 10 times faster for the user-defined vs built-in.
In response to Joshua's request for session info:
> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.12.1
Also, I tried running Simon's three examples, and the third example that Simon gives as the fastest, turns out for me the slowest:
> system.time(matrix(NA, nrow=10000, ncol=9999))
user system elapsed
2.011 0.159 2.171
> system.time({x=NA; length(x)=99990000; dim(x)=c(10000,9999); x})
user system elapsed
0.194 0.137 0.330
> system.time(matrix(logical(0), nrow=10000, ncol=9999))
user system elapsed
4.180 0.200 4.385
I still think however that Simon may be on the right track with the idea that matrix()
initially allocates a 1x1 matrix and then copies it. Anyone know of any good documentation on R internals? Thanks.
Upvotes: 3
Views: 1757
Reputation: 59612
Not sure this is the cause (could be a different inefficiency), but in do_matrix
in src/array.c there is a type switch containing :
case LGLSXP :
for (i = 0; i < nr; i++)
for (j = 0; j < nc; j++)
LOGICAL(ans)[i + j * NR] = NA_LOGICAL;
That looks to be page inefficient. Think it should be :
case LGLSXP :
for (j = 0; j < nc; j++)
for (i = 0; i < nr; i++)
LOGICAL(ans)[i + j * NR] = NA_LOGICAL;
or more simply :
case LGLSXP :
for (i = 0; i < nc*nr; i++)
LOGICAL(ans)[i] = NA_LOGICAL;
( with some fine tuning required since NR
is type R_xlen_t
whilst i
, nc
and nr
are type int
).
UPDATE:
After posting to r-devel :
Possible page inefficiency in do_matrix in array.c
Simon Urbanek has now committed a change to R. It now uses the single index approach above :
Latest live version of array.c
But as Simon says, and I covered myself for above, this doesn't appear to fix the particular issue raised by the question. A second, different, inefficiency needs to be found and fixed, too.
Here is perhaps what that follow up fix could be. This combines the page efficiency of the new code (now in R), but, switches to use that when matrix(data=NA)
(R's default). This avoids the modulo in copyMatrix
that Simon mentions in his answer, by avoiding copyMatrix
in the NA
case.
Currently, do_matrix in array.c has :
if(lendat) {
if (isVector(vals))
copyMatrix(ans, vals, byrow);
else
copyListMatrix(ans, vals, byrow);
} else if (isVector(vals)) {
// fill with NAs in the new page efficient way that Simon already committed.
}
which could be as follows. I'm not aware of an ISNA()
function at C level that takes SEXP
input, so have coded that long hand (Simon, is there a better way?) :
if(lendat && // but not a single NA, basically :
!(lendat==1 &&
((isLogical(vals) && LOGICAL(vals)[0] == NA_LOGICAL) ||
(isReal(vals) && ISNA(REAL(vals)[0])) ||
(isInteger(vals) && INTEGER(vals)[0] == NA_INTEGER)))) {
if (isVector(vals))
copyMatrix(ans, vals, byrow);
else
copyListMatrix(ans, vals, byrow);
} else if (isVector(vals)) {
// fill with NAs in the new page efficient way that Simon already committed.
// this branch will now run when dat is a single NA, too
}
Upvotes: 4
Reputation: 13932
The problem is that your matrix
call is a bit more complicated than you think it is. Compare the following versions:
# copy NA matrix
> system.time(matrix(NA, nrow=10000, ncol=9999))
user system elapsed
1.272 0.224 1.496
# replicate NA vector (faster version of what you used)
> system.time({x=NA; length(x)=99990000; dim(x)=c(10000,9999); x})
user system elapsed
0.292 0.260 0.552
# fastest - just allocate a matrix filled with NAs
> system.time(matrix(logical(0), nrow=10000, ncol=9999))
user system elapsed
0.184 0.308 0.495
So in your example you were essentially creating a 1 x 1 NA matrix which was replicated to the size you specified - the slowest approach. Doing the same with a vector is faster (because it doesn't need to use modulo for columns) - you did it in a bit complicated way (by creating a matrix, converting it into a vector and then back to a matrix), but the idea is the same. Finally, if you just use an empty vector, so the matrix will be simply filled with NA
s what you wanted anyway and thus no extra work is needed (fastest).
EDIT An important note, though: Matthew's suggestion was correct although it was not involved (since the code he cited is the logical(0)
case, not in the NA
case). Inadvertently I was running R-devel in the above timings so timings in released R will vary.
Upvotes: 8
Reputation: 40851
Hmm. Yeah, it's weird. ...and this is slightly faster still - even though it's more like matrix() in that it allows for a single data argument (but it must be scalar):
create.matrix2 <- function(data=NA, nrow, ncol) {
x <- rep.int(data[[1]], nrow*ncol)
dim(x) <- c(nrow, ncol)
x
}
system.time( x <- matrix(nrow=10000, ncol=9999) ) # 0.387 secs
system.time( y <- create.matrix(nrow=10000, ncol=9999) ) # 0.199 secs
system.time( z <- create.matrix2(nrow=10000, ncol=9999) ) # 0.173 secs
identical(x,z) # TRUE
...I guess the internal code for creating the matrix does something wasteful (or possibly useful, but I can't think of what that would be)...
Oh, since it handles data
of any length, it might end up with something similar to rep(data, length.out=nrow*ncol)
which is rather slow:
system.time( rep(NA, length.out=10000*9999) ) # 1.5 secs!
Anyway, definitely room for improvement!
Upvotes: 1
Reputation: 263411
I'm going to take issue with the comments, although I do understand most of them. The problem is that the referenced posting has an answer that has internal contradictions that the commenters have been relying on without checking. The timing for user and system do not add up correctly to elapsed as they should.
create.matrix <- function(size) {
x <- matrix()
length(x) <- size^2
dim(x) <- c(size,size)
x
}
system.time(x <- matrix(data=NA,nrow=10000,ncol=10000))
# user system elapsed
# 0.464 0.226 0.688
system.time(y <- create.matrix(size=10000))
# user system elapsed
# 0.177 0.239 0.414
I suspect that the efficiency is actually being achieved by the fact that the user-defined function is only capable of creating a square matrix and 'matrix' needs to do checks on validity of arguments for a more general situation.
EDIT: I see you have disproven one of my hypotheses (regarding the square matrix limitation) and I will also note that my other hypothesis that this was somehow due to lazy evaluation also failed my tests. The discrepancy really does not make sense because the user-code uses the matrix
function.
Upvotes: 5