opening-the-black-box
opening-the-black-box

Reputation: 43

MLR - getBMRModels - How to access each model from the benchmark result

When running a Benchmark Experiment on multiple algorithms, with tuning wrappers etc. there will be multiple models returned for each algorithm.

What is the canonical way, or an effective way, of extracting each individual tuned model (with the various hyperparameters) so that they can be accessed individually, and used individually for predictions without all the baggage of other models etc.?

Reproducible Example

# Required Packages
# Load required packages
library(mlr)
#library(dplyr)
library(parallelMap)
library(parallel)

# Algorithms
iterations = 10L
cv_iters = 2
### classif.gamboost ############################################################################################################################
classif_gamboost = makeLearner("classif.gamboost", predict.type="prob")

##The wrappers are presented in reverse order of application
###One-Hot Encoding
classif_gamboost = makeDummyFeaturesWrapper(classif_gamboost, method = "1-of-n")
###Missing Data Imputation
classif_gamboost = makeImputeWrapper(classif_gamboost, classes = list(numeric = imputeConstant(-99999), integer = imputeConstant(-99999), factor = imputeConstant("==Missing==")), dummy.type = "numeric", dummy.classes = c("numeric","integer"))

##### Tuning #####
inner_resamp = makeResampleDesc("CV", iters=cv_iters)
ctrl = makeTuneControlRandom(maxit=iterations)
hypss = makeParamSet(
  makeDiscreteParam("baselearner", values=c("btree")), #,"bols","btree","bbs"
  makeIntegerParam("dfbase", lower = 1, upper = 5),
  makeDiscreteParam("family", values=c("Binomial")),
  makeDiscreteParam("mstop", values=c(10,50,100,250,500,1000))
)
classif_gamboost = makeTuneWrapper(classif_gamboost, resampling = inner_resamp, par.set = hypss, control = ctrl, measures = list(auc, logloss, f1, ber, acc, bac, mmce, timetrain), show.info=TRUE)
### classif.gamboost ############################################################################################################################

### Random Forest ############################################################################################################################
classif_rforest = makeLearner("classif.randomForestSRC", predict.type="prob")

##The wrappers are presented in reverse order of application
###One-Hot Encoding
classif_rforest = makeDummyFeaturesWrapper(classif_rforest, method = "1-of-n")
###Missing Data Imputation
classif_rforest = makeImputeWrapper(classif_rforest, classes = list(numeric = imputeConstant(-99999), integer = imputeConstant(-99999), factor = imputeConstant("==Missing==")), dummy.type = "numeric", dummy.classes = c("numeric","integer"))

##### Tuning #####
inner_resamp = makeResampleDesc("CV", iters=cv_iters)
ctrl = makeTuneControlRandom(maxit=iterations)
hypss = makeParamSet(
  makeIntegerParam("mtry", lower = 1, upper = 30)
  ,makeIntegerParam("ntree", lower = 100, upper = 500)
  ,makeIntegerParam("nodesize", lower = 1, upper = 100)
)
classif_rforest = makeTuneWrapper(classif_rforest, resampling = inner_resamp, par.set = hypss, control = ctrl, measures = list(auc, logloss, f1, ber, acc, bac, mmce, timetrain), show.info=TRUE)
### Random Forest ############################################################################################################################

trainData = mtcars
target_feature = "am"
training_task_name = "trainingTask"
trainData[[target_feature]] = as.factor(trainData[[target_feature]])
trainTask = makeClassifTask(id=training_task_name, data=trainData, target=target_feature, positive=1, fixup.data="warn", check.data=TRUE)

train_indices = 1:25
valid_indices = 26:32
outer_resampling = makeFixedHoldoutInstance(train_indices, valid_indices, nrow(trainData))

no_of_cores = detectCores()
parallelStartSocket(no_of_cores, level=c("mlr.tuneParams"), logging = TRUE)
lrns = list(classif_gamboost, classif_rforest)
res = benchmark(tasks = trainTask, learners = lrns, resampling = outer_resampling, measures = list(logloss, auc, f1, ber, acc, bac, mmce, timetrain), show.info = TRUE, models = TRUE, keep.pred = FALSE)
parallelStop()

models = getBMRModels(res)
models

Upvotes: 2

Views: 556

Answers (1)

PhilippPro
PhilippPro

Reputation: 698

I would suggest to train a new model with the function train to further proceed, for example for predicting new data points. You would have used the complete dataset for training and not only a part.

If you want to use your models from the benchmarking, you can get them via getBMRModels as you already posted and then just get the specific model that you want. (Get the specific list element with models$ ...)

Upvotes: 1

Related Questions