ECII
ECII

Reputation: 10619

Multicore generation of plots

I have a for loop which generates via png() and dev.off() a plot and saves it the working directory.

The loop I have is similar to the following example

test.df<-data.frame(id=1:25000, x=rnorm(25000),y=rnorm(25000))

for (i in test.df$id){
  plot(test.df$x[test.df$id==i], test.df$y[test.df$id==i], xlab="chi",ylab="psi")
}

The for loop will run and generate thousands of plots. Is it possible to make it run parallel on all 8 cores of my system so that I can get the plots faster?

PS. The code is an example. My original problem and plots are much more complicated. Don't go viral on the example.

Upvotes: 3

Views: 1361

Answers (3)

csgillespie
csgillespie

Reputation: 60452

Provided you are using a new version of R, then this should be straightforward. The trick is to create a function that can be run on any core in any order. First we create our data frame:

test.df = data.frame(id=1:250, x=rnorm(250),y=rnorm(250))

Next we create the function that runs on each core:

#I could also pass the row or the entire data frame
myplot = function(id) {
  fname = paste0("/tmp/plot", id, ".png")
  png(fname)
  plot(test.df$x[id], test.df$y[id], 
      xlab="chi",ylab="psi")
  dev.off()
  return(fname)
}

Then I load the parallel package (this comes with base R)

library(parallel)

and then use mclapply

no_of_cores = 8
##Non windows
mclapply(1:nrow(test.df), myplot, 
         mc.cores = no_of_cores)

##All OS's
cl = makeCluster(no_of_cores)
clusterExport(cl, "test.df")
parSapply(cl, 1:nrow(test.df), myplot)
stopCluster(cl)

There are two advantages here:

  1. The package parallel comes with R, so we don't need to install anything extra
  2. We can switch off the "parallel" part:

    sapply(1:nrow(test.df), myplot)
    

Upvotes: 6

agstudy
agstudy

Reputation: 121568

Since mclapply is not supported on windows, I give a solution for windows users, using parallel package.

cl <- makeCluster(8)
parSapply(cl, 1:20, fun, fun.args)

Upvotes: 3

redmode
redmode

Reputation: 4941

With foreach package you have to modify you core code minimally. Also you can choose any backend of your choice regarding OS or other issues.

##
## Working dir and data generation
##
setwd("/path/to")
N <- 25000
test.df<-data.frame(id=1:N, x=rnorm(N),y=rnorm(N))

##
## Making a cluster
##
require(doSNOW) # Or any other backend of your choice
NC <- 8         # Number of nodes in cluster, i.e. cores
cl <- makeCluster(rep("localhost", NC), type="SOCK")
registerDoSNOW(cl)

## 
## Core loop
##
foreach(i=1:N) %dopar% {
  png(paste("plot",i,".png",sep=""))
  plot(test.df$x[test.df$id==i], test.df$y[test.df$id==i], xlab="chi",ylab="psi")
  dev.off()
}

##
## Stop cluster
##
stopCluster(cl)

It's easy to go for one core: just substitute %dopar% with %do%.

Upvotes: 5

Related Questions