badmax
badmax

Reputation: 645

Why is an elementwise product of a matrix much faster with apply?

I want to multiply all the elements of a matrix together. I can do it with two for loops or with apply. My intuition was that the for loops would be faster. Apply has to create a temporary vector to store the results of the product of rows, then apply product to that.

It still has to execute for loops to multiply all the elements so it's just an extra operation of storing the intermediate results that the for loops approach doesn't have to do. Yet it's still about 4 times faster. Why is that?

cols <- 1000
rows <- 1000

a <- matrix(runif(cols * rows, 1, 2), nrow = rows)

system.time({
  result <- 1
  for(i in 1:nrow(a)) {
    for(j in 1:ncol(a)) {
      result <- result * a[i, j]
    }
  }
})
# 0.09s

system.time(result <- prod(apply(a, 1, prod)))
# 0.01s

Upvotes: 1

Views: 428

Answers (3)

IRTFM
IRTFM

Reputation: 263411

Here's what I got with an effort to benchmark various methods. I have some concerns about the fact that Inf was the result of many of the calculations, and I wonder if a restriction to a range of 0-1 might be different. Like @badmax I was surprised that prod(a) was relatively slow. Seemed to me that it should have been coded in C and be more efficient. I also reasoned that a column oriented approach might be faster than a row oriented approach since that is how matrices in R are stored and was correct:

library(microbenchmark)
cols <- 1000
rows <- 1000
a <- matrix(runif(cols * rows, 1, 2), nrow = rows)

microbenchmark(loop1 ={
                   result <- 1
                   for(i in 1:nrow(a)) {
                      for(j in 1:ncol(a)) {
                        result <- result * a[i, j]
                         }               } }, 
          loop2 ={result <- 1
                    for(j in 1:ncol(a)) {
                         result <- result * prod(a[ , j])
                         }               },
          loop3 = {
                     result <- 1
                     for(i in 1:nrow(a)) {
                        result <- result * prod( a[i, ])
                         }                }, 
         apply_test = {result <- prod(apply(a, 1, prod))},
         prod_test = {result <- prod(a) },
         Reduce_test = {result <- Reduce("*", a)},
         log_sum = { result<- exp( sum(log(a)))}) #since sum of logs == log of prod
#====================
Unit: milliseconds
        expr        min         lq       mean     median         uq       max neval   cld
       loop1  58.872740  59.782277  60.665321  60.246169  61.156176  67.33558   100   c  
       loop2   5.314437   5.843748   7.316167   6.024948   6.626402  57.36532   100 a    
       loop3   9.614727  10.248335  11.521343  10.541872  10.947829  45.08280   100 ab   
  apply_test   8.336721   8.924148   9.960122   9.166424   9.429118  17.35621   100 ab   
   prod_test  94.314333  95.438939  95.956394  95.911858  96.286444  98.54229   100    d 
 Reduce_test 292.907175 312.754959 389.959756 354.369616 511.151578 545.80829   100     e
     log_sum  19.258281  19.916965  22.333617  20.134510  20.551704 180.18492   100  b   

I think the apply_test outcome is essentially doing the same as loop2, perhaps with a bit of overhead penalty for the apply_test. Here are the results for a test case where the range of random values were restricted to [0-1] (instead of [1-2]) and they do confirm my suspicion that some of the difference lies in the handling of Inf values:

Unit: milliseconds
        expr        min         lq       mean     median         uq        max neval  cld
       loop1  56.693831  58.846847  59.688896  59.448108  60.208619  63.005431   100   c 
       loop2   5.667955   5.907125  10.090634   6.109151   6.532552 183.985617   100 ab  
       loop3   9.533779  10.080330  12.760057  10.431867  10.734991 183.262217   100 ab  
  apply_test   8.144015   8.622861   9.940263   8.904425   9.962390  17.470028   100 ab  
   prod_test   1.327710   1.371449   1.411990   1.394160   1.432646   1.677596   100 a   
 Reduce_test 293.697339 312.384739 385.437743 356.997439 500.446356 557.253762   100    d
     log_sum  22.444015  23.224879  24.064932  23.539085  24.210656  29.774315   100  b  

The prod function is now rescued from its lowly position relative to the loops and apply methods.

Upvotes: 3

Omry Atia
Omry Atia

Reputation: 2443

To vectorize instead of apply you can use rowProds from matrixStats:

library(matrixStats)

microbenchmark::microbenchmark({AA = prod(rowProds(a))}, times = 10)

takes around 18 milliseconds

Upvotes: 1

mickey
mickey

Reputation: 2188

It looks like apply is faster because there is still some vectorization going on. Consider this for loop:

system.time({
  result <- 1
  for(i in 1:nrow(a)) {
      result <- result * prod(a[i,])
    }
  })

which, for me, is going about as fast the apply.

Upvotes: 0

Related Questions