Reputation: 10619
I have a for
loop which generates via png()
and dev.off()
a plot and saves it the working directory.
The loop I have is similar to the following example
test.df<-data.frame(id=1:25000, x=rnorm(25000),y=rnorm(25000))
for (i in test.df$id){
plot(test.df$x[test.df$id==i], test.df$y[test.df$id==i], xlab="chi",ylab="psi")
}
The for
loop will run and generate thousands of plots. Is it possible to make it run parallel on all 8 cores of my system so that I can get the plots faster?
PS. The code is an example. My original problem and plots are much more complicated. Don't go viral on the example.
Upvotes: 3
Views: 1361
Reputation: 60452
Provided you are using a new version of R, then this should be straightforward. The trick is to create a function that can be run on any core in any order. First we create our data frame:
test.df = data.frame(id=1:250, x=rnorm(250),y=rnorm(250))
Next we create the function that runs on each core:
#I could also pass the row or the entire data frame
myplot = function(id) {
fname = paste0("/tmp/plot", id, ".png")
png(fname)
plot(test.df$x[id], test.df$y[id],
xlab="chi",ylab="psi")
dev.off()
return(fname)
}
Then I load the parallel
package (this comes with base R)
library(parallel)
and then use mclapply
no_of_cores = 8
##Non windows
mclapply(1:nrow(test.df), myplot,
mc.cores = no_of_cores)
##All OS's
cl = makeCluster(no_of_cores)
clusterExport(cl, "test.df")
parSapply(cl, 1:nrow(test.df), myplot)
stopCluster(cl)
There are two advantages here:
parallel
comes with R, so we don't need to install anything extraWe can switch off the "parallel" part:
sapply(1:nrow(test.df), myplot)
Upvotes: 6
Reputation: 121568
Since mclapply
is not supported on windows,
I give a solution for windows users, using parallel
package.
cl <- makeCluster(8)
parSapply(cl, 1:20, fun, fun.args)
Upvotes: 3
Reputation: 4941
With foreach
package you have to modify you core code minimally. Also you can choose any backend of your choice regarding OS or other issues.
##
## Working dir and data generation
##
setwd("/path/to")
N <- 25000
test.df<-data.frame(id=1:N, x=rnorm(N),y=rnorm(N))
##
## Making a cluster
##
require(doSNOW) # Or any other backend of your choice
NC <- 8 # Number of nodes in cluster, i.e. cores
cl <- makeCluster(rep("localhost", NC), type="SOCK")
registerDoSNOW(cl)
##
## Core loop
##
foreach(i=1:N) %dopar% {
png(paste("plot",i,".png",sep=""))
plot(test.df$x[test.df$id==i], test.df$y[test.df$id==i], xlab="chi",ylab="psi")
dev.off()
}
##
## Stop cluster
##
stopCluster(cl)
It's easy to go for one core: just substitute %dopar%
with %do%
.
Upvotes: 5