Reputation: 443
I'm trying to provide my own sampler to the train function of the caret package (because of imbalanced data) and then train the model in a parallel environment. If I don't give the sampler to the train it works fine. If I give the sampler to the train but not use the parallel capability then again it works fine. But if I ask it to run in parallel with the sampler, then it gives me an error. I have tried running on two different systems and the result is the same but the error that I get in two situations are different. Here is an example:
library(caret)
set.seed(1)
data(iris)
library(DMwR)
library(doParallel)
cl <- makeCluster(3)
cl <- makeCluster(1) #uncommenting this will make the code work
print(cl)
registerDoParallel(cl)
smote_wrapper <- list(
name = "custom_smoting",
func = function(x, y) {
#print(dim(x))
print(length(y))
data <- cbind(x, data.frame(Class = y))
#print(table(data$Class))
print("calling smote")
final <- SMOTE(Class~., data, perc.over = 50, perc.under = 50)
print("smote over")
#print(dim(final))
final$Class <- as.factor(final$Class)
print(table(final$Class))
class_index <- which(colnames(final) == "Class")
print(paste("dim:", dim(final)))
result <- list(x = final[,-class_index], y = final$Class)
result
},
first = FALSE
)
data(iris)
control <- trainControl(sampling = smote_wrapper)
model <- train(Species~., iris, method = "svmLinear2", trControl = control)
stopCluster(cl)
On one system it stops training the mode and gives the error:
Error in { : task 1 failed - "object 'out2' not found
And on the other system it gives:
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :3 NA's :3
Error: Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
Maybe sampler doesn't work in parallel?
I was using the latest CRAN installation of caret (6.0.77) but due to another error ("optimismBoot not found") I had to install the latest version from github (devtools::install_github).
Upvotes: 1
Views: 504
Reputation: 13581
Looks like you might need to export your packages and variables to the cluster
registerDoParallel(cl)
# try these lines
clusterEvalQ(cl, { library(DMwR) })
clusterExport(cl, "smote_wrapper")
In parallel mode, caret will look in each new worker's environment for packages/variables but if you don't export them, they will not be available. Hope this helps.
Upvotes: 3