Why is using a for loop faster than apply, in computing the norm of all rows/columns of a matrix?

Question

Consider the following

n <- 10^4
p <- 2
foo <- matrix(runif(p*n), n, p)

I would like to compute the norm of each row of the matrix, i.e., to compute sqrt(crossprod(a_i)) where a_i is the i-th row of foo. I can do this with apply, or with a for loop:

for_loop <- function(x){
  range <- seq_along(x[,1])
  foo <- range
  for (i in range){
    foo[i] <- sqrt(crossprod(x[i,]))
    }
  foo
}

use_apply <- function(x){
  apply(x, 1, function(r) sqrt(crossprod(r)))
}

I thought the simpler apply code would be faster, however:

> microbenchmark(for_loop(foo), use_apply(foo), times = 1000)
Unit: milliseconds
           expr      min       lq     mean   median       uq      max neval
  for_loop(foo) 16.07111 18.87690 24.25369 20.78997 27.66441 179.8374  1000
 use_apply(foo) 24.77948 29.05891 35.98689 31.89625 40.30085 205.1632  1000

note that times = 1000 can take quite a bit of time, if you don't have a fast machine you may want to use microbenchmark defaults. Why is apply slower than the for loop code? Is there some function from purrr which would be faster?

EDIT I couldn't believe that crossprod(x) would be so much slower than sum(x*x), so I wanted to check Emmanuel-Lin's results. I get very different timings:

my_loop <- function(x){
  range <- seq_along(x[,1])
  foo <- range
  for (i in range){
    foo[i] <- sqrt(sum((x[i,] *x[i,])))
    }
  foo
}

my_apply <- function(x){
  apply(x, 1, function(r) sqrt(sum(r*r)))
}

for_loop <- function(x){
  range <- seq_along(x[,1])
  foo <- range
  for (i in range){
    foo[i] <- sqrt(crossprod(x[i,]))
  }
  foo
}

use_apply <- function(x){
  apply(x, 1, function(r) sqrt(crossprod(r)))
}
> microbenchmark(for_loop(foo), my_loop(foo), use_apply(foo), my_apply(foo))
Unit: milliseconds
           expr       min       lq     mean   median       uq       max neval
  for_loop(foo) 16.299758 17.77176 21.59988 19.04428 22.44558 131.33819   100
   my_loop(foo)  9.950813 12.02106 14.43540 12.66142 15.26865  45.42030   100
 use_apply(foo) 25.480019 27.95396 31.98351 29.85244 36.41599  60.88678   100
  my_apply(foo) 13.277354 14.98329 17.60356 15.98103 19.70325  34.07097   100

ok, my_apply and my_loop are faster (I still can't believe it! What, is crossprod optimized for slowness? :-/) but not so faster as Emmanuel-lin found. It's probably related to some dimension congruence checks which crossprod performs.

LyzandeR · Accepted Answer

apply is literally an R for-loop if you check the code:

   #only the for-loop code shown here
   if (length(d.call) < 2L) {
        if (length(dn.call)) 
            dimnames(newX) <- c(dn.call, list(NULL))
        for (i in 1L:d2) {
            tmp <- forceAndCall(1, FUN, newX[, i], ...)
            if (!is.null(tmp)) 
                ans[[i]] <- tmp
        }
    }
    else for (i in 1L:d2) {
        tmp <- forceAndCall(1, FUN, array(newX[, i], d.call, 
            dn.call), ...)
        if (!is.null(tmp)) 
            ans[[i]] <- tmp
    }

In addition to the above, apply will run a series of checks too, to make sure the arguments you provided were correct. It is the above that make it a bit slower.

However, lapply, sapply and vapply are C-based for-loops and therefore much faster than an R-based for loop.

Why is using a for loop faster than apply, in computing the norm of all rows/columns of a matrix?

Answers (2)

Related Questions