Why does `plan(multisession)` in R take so long and use so many CPUs?

Question

I want to parallelize a for loop in R, in a way that will work on both Windows and Linux.

Most solutions online seem to use something like:

plan(multisession, workers = availableCores())
results <- future_lapply(mydata, myfunc) # or future_lapply
plan(sequential)

I'm finding that just the first line, plan(multisession, workers = availableCores()) takes 9 seconds, maxing out all CPUs.

availableCores() returns instantly. If I set workers to half the number of cores on my machine, I can see that it uses about 50% of my cores for about half the time (5 seconds).

Note that I haven't even got to the line where the actual work is done. This is just setting up a 'plan'. I have no idea what that is.

What is it doing? Why does it take so long and use so much CPU? (This is a fresh R session, so there aren't large variables to copy across into each spawned process.)

In contrast, Python's multiprocessing.Pool() returns instantly. Why can't R do the same thing?

Why does `plan(multisession)` in R take so long and use so many CPUs?

Answers (0)

Related Questions