Reputation: 43
When running a Benchmark Experiment on multiple algorithms, with tuning wrappers etc. there will be multiple models returned for each algorithm.
What is the canonical way, or an effective way, of extracting each individual tuned model (with the various hyperparameters) so that they can be accessed individually, and used individually for predictions without all the baggage of other models etc.?
Reproducible Example
# Required Packages
# Load required packages
library(mlr)
#library(dplyr)
library(parallelMap)
library(parallel)
# Algorithms
iterations = 10L
cv_iters = 2
### classif.gamboost ############################################################################################################################
classif_gamboost = makeLearner("classif.gamboost", predict.type="prob")
##The wrappers are presented in reverse order of application
###One-Hot Encoding
classif_gamboost = makeDummyFeaturesWrapper(classif_gamboost, method = "1-of-n")
###Missing Data Imputation
classif_gamboost = makeImputeWrapper(classif_gamboost, classes = list(numeric = imputeConstant(-99999), integer = imputeConstant(-99999), factor = imputeConstant("==Missing==")), dummy.type = "numeric", dummy.classes = c("numeric","integer"))
##### Tuning #####
inner_resamp = makeResampleDesc("CV", iters=cv_iters)
ctrl = makeTuneControlRandom(maxit=iterations)
hypss = makeParamSet(
makeDiscreteParam("baselearner", values=c("btree")), #,"bols","btree","bbs"
makeIntegerParam("dfbase", lower = 1, upper = 5),
makeDiscreteParam("family", values=c("Binomial")),
makeDiscreteParam("mstop", values=c(10,50,100,250,500,1000))
)
classif_gamboost = makeTuneWrapper(classif_gamboost, resampling = inner_resamp, par.set = hypss, control = ctrl, measures = list(auc, logloss, f1, ber, acc, bac, mmce, timetrain), show.info=TRUE)
### classif.gamboost ############################################################################################################################
### Random Forest ############################################################################################################################
classif_rforest = makeLearner("classif.randomForestSRC", predict.type="prob")
##The wrappers are presented in reverse order of application
###One-Hot Encoding
classif_rforest = makeDummyFeaturesWrapper(classif_rforest, method = "1-of-n")
###Missing Data Imputation
classif_rforest = makeImputeWrapper(classif_rforest, classes = list(numeric = imputeConstant(-99999), integer = imputeConstant(-99999), factor = imputeConstant("==Missing==")), dummy.type = "numeric", dummy.classes = c("numeric","integer"))
##### Tuning #####
inner_resamp = makeResampleDesc("CV", iters=cv_iters)
ctrl = makeTuneControlRandom(maxit=iterations)
hypss = makeParamSet(
makeIntegerParam("mtry", lower = 1, upper = 30)
,makeIntegerParam("ntree", lower = 100, upper = 500)
,makeIntegerParam("nodesize", lower = 1, upper = 100)
)
classif_rforest = makeTuneWrapper(classif_rforest, resampling = inner_resamp, par.set = hypss, control = ctrl, measures = list(auc, logloss, f1, ber, acc, bac, mmce, timetrain), show.info=TRUE)
### Random Forest ############################################################################################################################
trainData = mtcars
target_feature = "am"
training_task_name = "trainingTask"
trainData[[target_feature]] = as.factor(trainData[[target_feature]])
trainTask = makeClassifTask(id=training_task_name, data=trainData, target=target_feature, positive=1, fixup.data="warn", check.data=TRUE)
train_indices = 1:25
valid_indices = 26:32
outer_resampling = makeFixedHoldoutInstance(train_indices, valid_indices, nrow(trainData))
no_of_cores = detectCores()
parallelStartSocket(no_of_cores, level=c("mlr.tuneParams"), logging = TRUE)
lrns = list(classif_gamboost, classif_rforest)
res = benchmark(tasks = trainTask, learners = lrns, resampling = outer_resampling, measures = list(logloss, auc, f1, ber, acc, bac, mmce, timetrain), show.info = TRUE, models = TRUE, keep.pred = FALSE)
parallelStop()
models = getBMRModels(res)
models
Upvotes: 2
Views: 556
Reputation: 698
I would suggest to train a new model with the function train
to further proceed, for example for predicting new data points. You would have used the complete dataset for training and not only a part.
If you want to use your models from the benchmarking, you can get them via getBMRModels
as you already posted and then just get the specific model that you want. (Get the specific list element with models$ ...)
Upvotes: 1