Reputation: 4289
I want to parallelize a for loop in R, in a way that will work on both Windows and Linux.
Most solutions online seem to use something like:
plan(multisession, workers = availableCores())
results <- future_lapply(mydata, myfunc) # or future_lapply
plan(sequential)
I'm finding that just the first line, plan(multisession, workers = availableCores())
takes 9 seconds, maxing out all CPUs.
availableCores()
returns instantly. If I set workers
to half the number of cores on my machine, I can see that it uses about 50% of my cores for about half the time (5 seconds).
Note that I haven't even got to the line where the actual work is done. This is just setting up a 'plan'. I have no idea what that is.
What is it doing? Why does it take so long and use so much CPU? (This is a fresh R session, so there aren't large variables to copy across into each spawned process.)
In contrast, Python's multiprocessing.Pool()
returns instantly. Why can't R do the same thing?
Upvotes: 1
Views: 52