Reputation: 14400
I would like to use foreach
in conjuction with data.table
(v.1.8.7) to load files and bind them. foreach
is not parallelizing, and returning a warning...
write.table(matrix(rnorm(5e6),nrow=5e5),"myFile.csv",quote=F,sep=",",row.names=F,col.names=T)
library(data.table);
#I use fread from data.table 1.8.7 (dev) for performance and useability
DT = fread("myFile.csv")
Now suppose I have n of those files to load and rowbind, I would like to parralellize it. (I am on Windows, so no forking)
allFiles = rep("myFile.csv",4) # you can change 3 to whatever
using lapply
f1 <- function(allFiles){
DT <- lapply(allFiles, FUN=fread) #will load sequentially myFile.csv 3 times with fread
DT <- rbindlist(DT);
return(DT);
}
using parallel (part of R as 2.14.0)
library(parallel)
f2 <- function(allFiles){
mc <- detectCores(); #how many cores?
cl <- makeCluster(mc); #build the cluster
DT <- parLapply(cl,allFiles,fun=fread); #call fread on each core (well... using each core at least)
stopCluster(cl);
DT <- rbindlist(DT);
return(DT);
}
now I want to use foreach
library(foreach)
f3 <- function(allFiles){
DT <- foreach(myFile=allFiles, .combine='rbind', .inorder=FALSE) %dopar% fread(myFile)
return(DT);
}
Here are some benchmarks confirming I can't kave foreach
working
system.time(DT <- f1(allFiles));
utilisateur systÞme ÚcoulÚ
34.61 0.14 34.84
system.time(DT <- f2(allFiles));
utilisateur systÞme ÚcoulÚ
1.03 0.40 24.30
system.time(DT <- f3(allFiles));
executing %dopar% sequentially: no parallel backend registered
utilisateur systÞme ÚcoulÚ
35.05 0.22 35.38
Upvotes: 4
Views: 2636
Reputation: 132989
Just to get this answered:
As the warning message tells you, there is no parallel backend registered for foreach
. Read this vignette to learn how to do that.
Simple example from the vignette:
library(doParallel)
cl <- makeCluster(3)
registerDoParallel(cl)
foreach(i=1:3) %dopar% sqrt(i)
Upvotes: 2