RobertoAS
RobertoAS

Reputation: 89

Is it possible use foreach in parallel for all nests for a function using input from previous step in R?

I am running some heavy simulations in R. Hence, I have been trying to parallelize my computations to speed it up. I have figured out how to use the foreach package to run a loop in parallel. I have also figured out how to use the same package to run nested loops in parallel. However, I don`t know how to run recursions in parallel (or even if it possible to do so). Is it possible to do it and, if so, how can I implement it in my example code below?

Below I provide an example code of what I would like to do and what I have done that actually works (my code is actually much more complicated than this and has a third loop, the code below is a simplification for clarity).

What I would like to achieve: Both loops are run in parallel and I can use the output from the inside loop as an input to the function it contains in the next step of the outside loop. This is shown in the code below. However, this code does not work, because it does not actually update the next_step_data in memory. In other words, this is as if I was inputting data to my_function() in all the four iterations of the inside loop.

install.packages("pacman"); library(pacman)
pacman::p_load(mlogit, tidyverse, foreach, doParallel)

registerDoParallel(cores=4)

next_step_data <- data

output <- foreach (j=1:1000, .combine = "rbind") %dopar% {

    next_step_data <- foreach (k=1:4, .combine = "rbind") %dopar% {
  
      my_function(k, next_step_data)
  
    }

    next_step_data %>%
      group_by(var1, var2) %>%
      summarise(n = n()/4) %>%
      ungroup()

  }

What I have managed to do: Only the inside loop is run in parallel. That is sort of fine because my_function() is the most computationally intensive part of my code, but the outside loop that will runs 1000 times, so it would be great to have everything running in parallel.

install.packages("pacman"); library(pacman)
pacman::p_load(mlogit, tidyverse, foreach, doParallel)

registerDoParallel(cores=4)

next_step_data <- data

output <- foreach (j=1:1000, .combine = "rbind") %do% {

    next_step_data <- foreach (k=1:4, .combine = "rbind") %dopar% {
  
      my_function(k, next_step_data)
  
    }

    next_step_data %>%
      group_by(var1, var2) %>%
      summarise(n = n()/4) %>%
      ungroup()

  }

Thanks for your help!

Upvotes: 0

Views: 69

Answers (1)

George Ostrouchov
George Ostrouchov

Reputation: 551

Based on your sketch, here is an equivalent mclapply sketch:

library(parallel)
library(tidyverse)
## add other libs needed

outer_function <- function(i, next_step_data) {
    ns_list <- lapply(1:4, my_function, next_step_data = next_step_data)
    next_step_data <- do.call(rbind, ns_list)

    next_step_data %>%
      group_by(var1, var2) %>%
      summarise(n = n()/4) %>%
      ungroup()
}

output <- mclapply(1:1000, outer_function, next_step_data = data, mc.cores = 4)
output <- do.call(rbind, output)

Assumes my_function(k, next_step_data) has parameters as stated. Adjust mc.cores to your available cores.

mclapply uses the Unix fork which is copy-on-write, meaning that memory is shared (no duplication) until you change something in that memory and then only the pages written-to are copied.

Upvotes: 0

Related Questions