sym246
sym246

Reputation: 1866

For loop using multicores in R with foreach

I have an extremely long piece of code which is made up of multiple user defined functions which are sourced at the start of my code. The whole script is within a for loop, which reads in csv files one by one, analyses them and outputs 2 csv files and a PNG per iteration. On average, the code takes around 18 seconds per file, and there are normally around 150/200 files to be analysed in one go.

This takes a long period of time, so I want to take advantage of the 8 cores on my PC.

I have changed my main for loop to foreach and have added %dopar%, however, my code does not work.

An example is shown:

cl=makeCluster(8)
registerDoParallel(cl)

library(parallel)
library(foreach)
library(ggplot2)
library(data.table)

foreach(kk=1:2) %dopar% {
  Data=rnorm(60000,3,kk)
  Date=seq(as.POSIXct("2014-01-01 00:00:00"), length.out=60000, by="15 mins")
  DF=data.frame(Date,Data)

  DF$MeanDiff=sapply(DF$Data, function(x) abs(x-mean(DF$Data)))

  write.csv(data.table(DF), file="Data with difference from mean.csv", row.names=F)

  DF$Colour=c(rep("Pink",30000),rep("Blue",30000))

  file_name_data = "Test plot.jpg"
  png(filename=file_name_data,width=900,height=600,res=80)
  print(ggplot(DF, aes(Date, Data,colour=Colour, group=1))+geom_line()+geom_point()+
          scale_x_datetime(limits=c(as.POSIXct(Date[1]), as.POSIXct(Date[length(Date)])), labels = date_format("%d-%m-%y")))
  dev.off() 
}

I believe that the problem is that the other loaded packages are not able to be used in the foreach loop. If this is the case, how do I rectify this? Secondly, would this also mean that any source code files previous loaded outside of the loop could not be used? i.e. my user defined functions?

I may be missing the point, but I imagine that there is an easier way to do this that I have not caught on to yet. Any advice would be appreciated.

Upvotes: 2

Views: 1221

Answers (2)

MikeJewski
MikeJewski

Reputation: 357

library(doParallel)
library(foreach)

cl <- 4
registerDoParallel(cl)

foreach(kk=1:2)%dopar%{
    library(ggplot2)
    library(data.table)
    #your code
}

I just started using foreach a while ago, so I may be wrong when saying this, but this is how I understand it so far. When you use foreach with doParallel, it creates a new R instance, which requires you to reload your libraries for each instance. Also, anything that is previously loaded into the workspace before foreach is called, and which is subsequently called in the foreach loop, will be carried over into the new instances.

Upvotes: 1

Berecht
Berecht

Reputation: 1135

library(parallel)
library(foreach)
library(doSNOW)

cl <- makeCluster(8) #8 is the number of cores 
registerDoSNOW(cl)

foreach(kk=1:2) %dopar% {
 #your code
}

Upvotes: 0

Related Questions