user1134616
user1134616

Reputation: 604

Parallelize a function in R (not a loop!)

I want to parallelize a function called unparallelizedfnc. The function calls four other functions (that take a long time to compute) and stores the results. At the end the results are combined. Consider a toy example of my function (of course these are not the four real functions I call and is only for demonstration).

How do I parallelize the computation of result1, result2, result3 and result4 on a computer with multiple cores? I would like it to work on Windows, Linux and Mac OSX. No need to benchmark the parallelized version in this case (It will be slower due to overhead, but in my real code it will be faster).

If the four results were the same function (but with different data) I could just use a parallel for loop (foreach) or a parallel apply but in this case the functions are different.

unparallelizedfnc <- function(x) {

  result1 <- sum(x)
  result2 <- median(x)
  result3 <- min(x)
  result4 <- max(x)

  result <- mean(c(result1,result2,result3, result4))
  result
}


unparallelizedfnc(rnorm(100000))

Upvotes: 2

Views: 1271

Answers (1)

Roland
Roland

Reputation: 132706

I corrected your function as suggested by @Jilber first:

unparallelizedfnc <- function(x) {

  result1 <- sum(x)
  result2 <- median(x)
  result3 <- min(x)
  result4 <- max(x)

  result <- mean(c(result1,result2,result3, result4))
  result
}


parallelizedfnc <- function(x) {
  require(parallel)
  funs <- list(sum,median,min,max)
  mean(do.call("c",mclapply(funs,function(fun) fun(x),mc.cores = 4)))
}

set.seed(42)
x <- rnorm(1e8)
identical(unparallelizedfnc(x),parallelizedfnc(x))
#[1] TRUE

library(microbenchmark)
microbenchmark(unparallelizedfnc(x),parallelizedfnc(x),times=3)

# Unit: seconds
#                 expr      min       lq   median       uq      max neval
# unparallelizedfnc(x) 3.155736 3.166381 3.177027 3.195497 3.213967     3
#   parallelizedfnc(x) 5.047008 5.207747 5.368486 5.514221 5.659956     3

Note that sum et al. are too fast to benefit from parallelization. Due to parallelization overhead the function takes even more time. I assume your real use case has less optimized functions.

Upvotes: 6

Related Questions