Reputation: 11
In going through the online mlr3 book Applied Machine Learning Using mlr3 in R (https://mlr3book.mlr-org.com/chapters/chapter4/hyperparameter_optimization.html), I am having a bit of difficulty figuring out how to make sure that the hyper-parameters are optimized only on the training data and that subsequent prediction occurs only on the test data. This is the code and initial error. Note that after introducing this code the chapter moves onto the use of the auto_tune command to do this, but for my purposes I need to do it manually here.
library(mlr3tuning)
library(mlr3tuningspaces)
library(mlr3learners)
library(mlr3extralearners)
library(e1071)
library(paradox)
#Specifying Task
tsk_sonar = tsk("sonar")
tsk_sonar$set_col_roles("Class", c("target", "stratum"))
#Partitioning Data set into Train and Test Samples
splits = mlr3::partition(tsk_sonar, ratio = 0.80)
#Defining Learner and range of hyperparameters for optimization
learner = lrn("classif.svm",
cost = to_tune(1e-5, 1e5, logscale = TRUE),
gamma = to_tune(1e-5, 1e5, logscale = TRUE),
kernel = "radial",
type = "C-classification"
)
#Specifying the rows constituting the training data set for the learner
learner$train(tsk_sonar, row_ids = splits$train)
> learner$train(tsk_sonar, row_ids = splits$train)
Error in svm.default(x = data, y = task$truth(), probability = (self$predict_type == :
'list' object cannot be coerced to type 'double'
#Specifying Tuning Instance
instance = ti(
task = tsk_sonar,
learner = learner,
resampling = rsmp("cv", folds = 3),
measures = msr("classif.ce"),
terminator = trm("none")
)
# Defining Hyperparamter Search
tuner = tnr("grid_search", resolution = 5, batch_size = 10)
#Running hyperparameter tuning for optimization
tuner$optimize(instance)
#Training the data on the full data set
lrn_svm_tuned = lrn("classif.svm")
lrn_svm_tuned$param_set$values = instance$result_learner_param_vals
#Final trained model for use in prediction
lrn_svm_tuned$train(tsk_sonar)$model
#Create predictions on the test data
prediction = lrn_svm_tuned$predict(tsk_sonar, splits$test)
Upvotes: 1
Views: 20
Reputation: 1491
You found a bug. It shouldn't be possible to train the learner with TuneToken
present in the parameter set. This has nothing to do with the train-test split. If you are really worried by this, you can check the resampling splits in instance$archive$benchmark_result$resamplings
after the optimization.
Upvotes: 1