Reputation: 604
I want to parallelize a function called unparallelizedfnc. The function calls four other functions (that take a long time to compute) and stores the results. At the end the results are combined. Consider a toy example of my function (of course these are not the four real functions I call and is only for demonstration).
How do I parallelize the computation of result1, result2, result3 and result4 on a computer with multiple cores? I would like it to work on Windows, Linux and Mac OSX. No need to benchmark the parallelized version in this case (It will be slower due to overhead, but in my real code it will be faster).
If the four results were the same function (but with different data) I could just use a parallel for loop (foreach) or a parallel apply but in this case the functions are different.
unparallelizedfnc <- function(x) {
result1 <- sum(x)
result2 <- median(x)
result3 <- min(x)
result4 <- max(x)
result <- mean(c(result1,result2,result3, result4))
result
}
unparallelizedfnc(rnorm(100000))
Upvotes: 2
Views: 1271
Reputation: 132706
I corrected your function as suggested by @Jilber first:
unparallelizedfnc <- function(x) {
result1 <- sum(x)
result2 <- median(x)
result3 <- min(x)
result4 <- max(x)
result <- mean(c(result1,result2,result3, result4))
result
}
parallelizedfnc <- function(x) {
require(parallel)
funs <- list(sum,median,min,max)
mean(do.call("c",mclapply(funs,function(fun) fun(x),mc.cores = 4)))
}
set.seed(42)
x <- rnorm(1e8)
identical(unparallelizedfnc(x),parallelizedfnc(x))
#[1] TRUE
library(microbenchmark)
microbenchmark(unparallelizedfnc(x),parallelizedfnc(x),times=3)
# Unit: seconds
# expr min lq median uq max neval
# unparallelizedfnc(x) 3.155736 3.166381 3.177027 3.195497 3.213967 3
# parallelizedfnc(x) 5.047008 5.207747 5.368486 5.514221 5.659956 3
Note that sum
et al. are too fast to benefit from parallelization. Due to parallelization overhead the function takes even more time. I assume your real use case has less optimized functions.
Upvotes: 6