Reputation: 333
Why is assigning to a row/column of a matrix much slower than filling vector of the same size?
N <- 1e8
u <- numeric(N)
system.time(u <- 1:N)
##user system elapsed
##0.091 0.100 0.191
u <- matrix(nrow=2,ncol=N)
system.time(u[1,] <- 1:N)
##user system elapsed
##2.772 0.436 3.212
My main concern is with arrays, but I imagine the answer for matrices will explain arrays as well.
N <- 1e3
uu <- matrix(N,N)
system.time({uu <- 1:(N*N)})
##user system elapsed
##0.068 0.120 0.188
uu <- array(dim=c(2,N,N))
system.time({uu[1,,] <- 1:(N*N)})
##user system elapsed
##4.409 0.241 4.657
(I could not tell if assigning to an element of a large pre-allocated vector is slower than assigning to a variable, because the former always takes 0 time for any vector my system is capable of allocating.)
Upvotes: 1
Views: 52
Reputation: 73385
Because you are filling in a row not a column. You get more cache misses, as well as address arithmetic overhead (locating base address for each column).
In R, a matrix, or a general array is essentially a long vector; dimension is just an attribute. The fastest index is the rightmost one, so elements are contiguous when you scan the matrix by its column, or an array by its last margin. This means, on a typical machine with 64 bytes L1 cache line size which can hold 8 double precision numeric, you have 1 cache miss per scanning 8 elements. But if you access a matrix of two rows by row for example, you get 1 cache miss per 4 elements.
Another issue, giving you an unfair comparison, is you are comparing overwriting with assignment, not overwriting with overwriting. Using u[] <- 1:N
will force an overwriting to the preallocated vector u
.
Consider the following benchmarking:
library(microbenchmark)
n <- 10000
u <- 1:n
m1 <- matrix(0, n, 2)
m2 <- matrix(0, 2, n)
x <- numeric(n)
microbenchmark (m1[,1] <- u, m2[1,] <- u, x[] <- u)
Upvotes: 2