Reputation: 1866
I have an extremely long piece of code which is made up of multiple user defined functions which are sourced at the start of my code. The whole script is within a for
loop, which reads in csv files one by one, analyses them and outputs 2 csv files and a PNG per iteration. On average, the code takes around 18 seconds per file, and there are normally around 150/200 files to be analysed in one go.
This takes a long period of time, so I want to take advantage of the 8 cores on my PC.
I have changed my main for
loop to foreach
and have added %dopar%
, however, my code does not work.
An example is shown:
cl=makeCluster(8)
registerDoParallel(cl)
library(parallel)
library(foreach)
library(ggplot2)
library(data.table)
foreach(kk=1:2) %dopar% {
Data=rnorm(60000,3,kk)
Date=seq(as.POSIXct("2014-01-01 00:00:00"), length.out=60000, by="15 mins")
DF=data.frame(Date,Data)
DF$MeanDiff=sapply(DF$Data, function(x) abs(x-mean(DF$Data)))
write.csv(data.table(DF), file="Data with difference from mean.csv", row.names=F)
DF$Colour=c(rep("Pink",30000),rep("Blue",30000))
file_name_data = "Test plot.jpg"
png(filename=file_name_data,width=900,height=600,res=80)
print(ggplot(DF, aes(Date, Data,colour=Colour, group=1))+geom_line()+geom_point()+
scale_x_datetime(limits=c(as.POSIXct(Date[1]), as.POSIXct(Date[length(Date)])), labels = date_format("%d-%m-%y")))
dev.off()
}
I believe that the problem is that the other loaded packages are not able to be used in the foreach
loop. If this is the case, how do I rectify this? Secondly, would this also mean that any source code files previous loaded outside of the loop could not be used? i.e. my user defined functions?
I may be missing the point, but I imagine that there is an easier way to do this that I have not caught on to yet. Any advice would be appreciated.
Upvotes: 2
Views: 1221
Reputation: 357
library(doParallel)
library(foreach)
cl <- 4
registerDoParallel(cl)
foreach(kk=1:2)%dopar%{
library(ggplot2)
library(data.table)
#your code
}
I just started using foreach a while ago, so I may be wrong when saying this, but this is how I understand it so far. When you use foreach with doParallel, it creates a new R instance, which requires you to reload your libraries for each instance. Also, anything that is previously loaded into the workspace before foreach is called, and which is subsequently called in the foreach loop, will be carried over into the new instances.
Upvotes: 1
Reputation: 1135
library(parallel)
library(foreach)
library(doSNOW)
cl <- makeCluster(8) #8 is the number of cores
registerDoSNOW(cl)
foreach(kk=1:2) %dopar% {
#your code
}
Upvotes: 0