datajoel
datajoel

Reputation: 100

Why is this for loop code as fast as apply()?

EDITED to provide re-producible results

It wasn't clear initially, but I need the results to take into account NAs in the raw data (df)

#

I initially wrote my code with for loops to get the proof of concept working, but now I need to speed things up. It runs in ~2-3 minutes with the for loops, but when I re-wrote it using apply(), it wasn't any faster. I thought apply() should be the vectorized solution and therefore faster, but maybe my whole premise is incorrect? (I am not new to R, but computational speed is not usually an issue for me.)

I am working with 1000+ cases and ~100 variables and need to perform 5000+ simulations with the data (turning on and off different conditions).

Starting definitions & sample data:

cases = 1000
variables = 100
simulations = 5000

df <- as.data.frame(array(rnorm(cases * variables, 0, 5), dim=c(cases, variables)))

montecarlo <- matrix(rbinom(simulations * variables, 1, 20/variables), simulations, variables)
montecarlo[montecarlo==0] <- NA

calc <- array(,dim=c(cases, variables, simulations))
interim <- array(,dim=c(cases, variables, simulations))
results <- array(,dim=c(variables, simulations))

for (j in 1:simulations) {
  calc[,,j] <- exp(t(t(df) * as.numeric(montecarlo[j,])))
}

For loop version:

for (j in 1:simulations) {
  interim[,,j] <- t(apply(calc[,,j], 1, function(x) x/sum(x, na.rm = TRUE))) # re-share
  results[,j] <- apply(interim[,,j], 2, sum) # aggregates results
}

Apply() version:

interim <- apply(calc, c(1,3), function(x) x/sum(x, na.rm = TRUE)) # re-share
results <- as.data.frame(t(apply(interim, c(1,3), sum))) # aggregates results

I am open to any suggestions to speed things up and/or a reason why the apply() version is not any faster. Thank you!

Upvotes: 1

Views: 165

Answers (2)

Cole
Cole

Reputation: 11255

As @Mikko Marttila noted, the apply() family does not guarantee faster code. Using sweep and aperm(), the code below is about 3x faster for a 1,000 x 100 x 70 array (i.e., only 7 million elements).

results4 <- colSums(sweep(calc, c(1,3), colSums(aperm(calc, c(2,1,3)), na.rm = T), FUN = '/'), na.rm = T)

Or, for slightly less performance but is more similar to what you originally had:

interim3 <- sweep(calc, c(1,3), apply(calc, 3, rowSums, na.rm = T), FUN = '/')
results3 <- apply(interim3, c(2,3), sum, na.rm = T)

Performance:

Unit: milliseconds
          expr      min       lq     mean   median       uq      max neval
      for_loop 510.9131 514.9030 537.0344 518.2491 524.5709 705.4087    10
      apply_OP 446.0352 458.4940 491.6710 500.1995 523.1843 533.9654    10
 sweep_rowSums 225.5855 233.2632 252.6149 240.7245 284.1517 292.3476    10
   sweep_aperm 136.2519 140.8912 163.7498 154.6984 191.5337 217.8015    10

Data

cases = 1000
variables = 100
simulations = 70

set.seed(123)
calc <- array(sample(cases *variables * simulations),dim=c(cases, variables, simulations))

interim <- array(,dim=c(cases, variables, simulations))
results <- array(,dim=c(variables, simulations))

# Original Loop
for (j in seq_len(simulations)) {
  interim[,,j] <- t(apply(calc[,,j], 1, function(x) x/sum(x, na.rm = TRUE))) # re-share
  results[,j] <- apply(interim[,,j], 2, sum) # aggregates results
}

# original apply

interim2 <- apply(calc, c(1,3), function(x) x/sum(x, na.rm = TRUE)) # re-share
results2 <- apply(interim2, c(1,3), sum) # aggregates results

# using sweep

interim3 <- sweep(calc, c(1,3), apply(calc, 3, rowSums, na.rm = T), FUN = '/')
results3 <- apply(interim3, c(2,3), sum, na.rm = T)

#using sweep and aperm
# interim4 <- sweep(calc, c(1,3), colSums(aperm(calc, c(2,1,3)), na.rm = T), FUN = '/')
results4 <- colSums(sweep(calc, c(1,3), colSums(aperm(calc, c(2,1,3)), na.rm = T), FUN = '/'), na.rm = T)

all.equal(results4, results3, results2, results)



library(microbenchmark)

microbenchmark(
  for_loop = {
    for (j in seq_len(simulations)) {
      interim[,,j] <- t(apply(calc[,,j], 1, function(x) x/sum(x, na.rm = TRUE))) # re-share
      results[,j] <- apply(interim[,,j], 2, sum) # aggregates results
    }
  }
  ,
  apply_OP = {
    interim2 <- apply(calc, c(1,3), function(x) x/sum(x, na.rm = TRUE)) # re-share
    results2 <- apply(interim2, c(1,3), sum) # aggregates results
  }
  ,
  sweep_rowSums = {
    interim3 <- sweep(calc, c(1,3), apply(calc, 3, rowSums, na.rm = T), FUN = '/')
    results3 <- apply(interim3, c(2,3), sum, na.rm = T)
  }
  ,
  sweep_aperm = {
    results4 <- colSums(sweep(calc, c(1,3), colSums(aperm(calc, c(2,1,3)), na.rm = T), FUN = '/'), na.rm = T)
  }
  , times = 10
)

Upvotes: 1

Mikko Marttila
Mikko Marttila

Reputation: 11898

Generally speaking: for-loops aren’t inherently slow. If you’re preallocating output (i.e. not growing a vector, causing multiple copies), they’re pretty much comparable to *apply() functions in speed. The overhead in iterating comes from repeatedly calling R functions from C code; and the advantage of using functionals is simply clarity. Here’s an example with a for-loop wrapped in a function:

foo <- function(x, f, ...) {
  out <- vector("list", length(x))
  for (i in seq_along(x)) {
    out[[i]] <- f(x[[i]], ...)
  }
  out
}

x <- replicate(10000, rnorm(30), simplify = FALSE)
bench::mark(foo(x, mean), lapply(x, mean))
#> # A tibble: 2 x 6
#>   expression           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>      <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 foo(x, mean)      36.9ms   38.4ms      26.1   157.3KB     52.3
#> 2 lapply(x, mean)   42.6ms   44.9ms      22.3    78.2KB    100.

The way to get speed improvements in these cases is to move all of the calculations into compiled code.

That said, there may well be other optimizations for your specific problem. You might want to provide a reproducible example, and ask a new question on Code Review about performance improvements.

Created on 2019-09-02 by the reprex package (v0.3.0.9000)

Upvotes: 2

Related Questions