Convert R apply statement to lapply for parallel processing

Question

I have the following R "apply" statement:

for(i in 1:NROW(dataframe_stuff_that_needs_lookup_from_simulation))
{
    matrix_of_sums[,i]<-
    apply(simulation_results[,colnames(simulation_results) %in% 
    dataframe_stuff_that_needs_lookup_from_simulation[i,]],1,sum)
}

So, I have the following data structures:

simulation_results: A matrix with column names that identify every possible piece of desired simulation lookup data for 2000 simulations (rows).

dataframe_stuff_that_needs_lookup_from_simulation: Contains, among other items, fields whose values match the column names in the simulation_results data structure.

matrix_of_sums: When function is run, a 2000 row x 250,000 column (# of simulations x items being simulated) structure meant to hold simulation results.

So, the apply function is looking up the dataframe columns values for each row in a 250,000 data set, computing the sum, and storing it in the matrix_of_sums data structure.

Unfortunately, this processing takes a very long time. I have explored the use of rowsums as an alternative, and it has cut the processing time in half, but I would like to try multi-core processing to see if that cuts processing time even more. Can someone help me convert the code above to "lapply" from "apply"?

Thanks!

CPak · Accepted Answer

With base R parallel, try

library(parallel)
cl <- makeCluster(detectCores())
matrix_of_sums <- parLapply(cl, 1:nrow(dataframe_stuff_that_needs_lookup_from_simulation), function(i)
    rowSums(simulation_results[,colnames(simulation_results) %in% 
        dataframe_stuff_that_needs_lookup_from_simulation[i,]]))
stopCluster(cl)
ans <- Reduce("cbind", matrix_of_sums)

You could also try foreach %dopar%

library(doParallel)  # will load parallel, foreach, and iterators
cl <- makeCluster(detectCores())
registerDoParallel(cl)
matrix_of_sums <- foreach(i = 1:NROW(dataframe_stuff_that_needs_lookup_from_simulation)) %dopar% {
    rowSums(simulation_results[,colnames(simulation_results) %in% 
    dataframe_stuff_that_needs_lookup_from_simulation[i,]])
}
stopCluster(cl)
ans <- Reduce("cbind", matrix_of_sums)

I wasn't quite sure how you wanted your output at the end, but it looks like you're doing a cbind of each result. Let me know if you're expecting something else however.

Convert R apply statement to lapply for parallel processing

Answers (2)

Related Questions