Reputation: 3943
How do I get parallelizaton of code to work in r in Windows? Include a simple example. Posting this self-answered question because this was rather unpleasant to get working. You'll find package parallel does NOT work on its own, but package snow works very well.
Upvotes: 38
Views: 43522
Reputation: 11
For what it is worth. I was running into the same problem but couldn't get any of these to work. I eventually learned that Rstudio has a 'jobs' pane and can run models in the background each on it's own core. so what I did was divy-up my model into 10 segments (it was iterative over 100 vectors so 10 scripts of 10 vectors each) and ran each as a separate job. that way when one finished I could use the output from it immediately and I could keep working on my script without waiting for each model to finish. here is the link all about using jobs https://blog.rstudio.com/2019/03/14/rstudio-1-2-jobs/
Upvotes: 1
Reputation: 1952
I'm posting a cross-platform answer here because all the other answers I found were over-complicated for what I needed to accomplish. I'm using an example where I'm reading in all sheets of an excel workbook.
# read in the spreadsheet
parallel_read <- function(file){
# detect available cores and use 70%
numCores = round(parallel::detectCores() * .70)
# check if os is windows and use parLapply
if(.Platform$OS.type == "windows") {
cl <- makePSOCKcluster(numCores)
parLapply(cl, file, readxl::read_excel)
stopCluster(cl)
return(dfs)
# if not Windows use mclapply
} else {
dfs <-parallel::mclapply(excel_sheets(file),
readxl::read_excel,
path = file,
mc.cores=numCores)
return(dfs)
}
}
Upvotes: 3
Reputation: 3943
Posting this because this took me bloody forever to figure out. Here's a simple example of parallelization in r that will let you test if things are working right for you and get you on the right path.
library(snow)
z=vector('list',4)
z=1:4
system.time(lapply(z,function(x) Sys.sleep(1)))
cl<-makeCluster(###YOUR NUMBER OF CORES GOES HERE ###,type="SOCK")
system.time(clusterApply(cl, z,function(x) Sys.sleep(1)))
stopCluster(cl)
You should also use library doSNOW to register foreach to the snow cluster, this will cause many packages to parallelize automatically. The command to register is registerDoSNOW(cl)
(with cl
being the return value from makeCluster()
) , the command that undoes registration is registerDoSEQ()
. Don't forget to turn off your clusters.
Upvotes: 50
Reputation: 12808
This worked for me, I used package doParallel, required 3 lines of code:
# process in parallel
library(doParallel)
cl <- makeCluster(detectCores(), type='PSOCK')
registerDoParallel(cl)
# turn parallel processing off and run sequentially again:
registerDoSEQ()
Calculation of a random forest decreased from 180 secs to 120 secs (on a Windows computer with 4 cores).
Upvotes: 18
Reputation: 113
Based on the information here I was able to convert the following code into a parallelised version that worked under R Studio on Windows 7.
Original code:
#
# Basic elbow plot function
#
wssplot <- function(data, nc=20, seed=1234){
wss <- (nrow(data)-1)*sum(apply(data,2,var))
for (i in 2:nc){
set.seed(seed)
wss[i] <- sum(kmeans(data, centers=i, iter.max=30)$withinss)}
plot(1:nc, wss, type="b", xlab="Number of clusters",
ylab="Within groups sum of squares")
}
Parallelised code:
library("parallel")
workerFunc <- function(nc) {
set.seed(1234)
return(sum(kmeans(my_data_frame, centers=nc, iter.max=30)$withinss)) }
num_cores <- detectCores()
cl <- makeCluster(num_cores)
clusterExport(cl, varlist=c("my_data_frame"))
values <- 1:20 # this represents the "nc" variable in the wssplot function
system.time(
result <- parLapply(cl, values, workerFunc) ) # paralel execution, with time wrapper
stopCluster(cl)
plot(values, unlist(result), type="b", xlab="Number of clusters", ylab="Within groups sum of squares")
Not suggesting it's perfect or even best, just a beginner demonstrating that parallel does seem to work under Windows. Hope it helps.
Upvotes: 5
Reputation: 1585
I think these libraries will help you:
foreach (facilitates executing the loop in parallel)
doSNOW (I think you already use it)
doMC (multicore functionality of the parallel package)
May these article also help you
http://vikparuchuri.com/blog/parallel-r-loops-for-windows-and-linux/
http://www.joyofdata.de/blog/parallel-computing-r-windows-using-dosnow-foreach/
Upvotes: 4