Reputation: 2890
I am trying to tune the hyperparameters in mlr
using the tuneParams
function. However, I can't make sense of the results it is giving me (or else Im using it incorrectly).
For example, if I create some data with a binary response and then create an mlr
h2o
classification model and then check the accuracy and AUC I will get some values.
Then, if I use tuneParams
on some parameters and find a better accuracy and AUC and then plug them into my model. The resulting accuracy and AUC (for the model) does not match that found by using tuneParams
.
Hopefully the code below will illustrate my issue:
library(mlr)
# Create data
set.seed(1234)
Species <- sample(c("yes", "no"), size = 150, replace = T)
dat <- data.frame(
x1 = (Species == "yes") + rnorm(150),
x2 = (Species == "no") + rnorm(150), Species
)
# split into training and test
train <- sample(nrow(dat), round(.7*nrow(dat))) # split 70-30
datTrain <- dat[train, ]
datTest <- dat[-train, ]
# create mlr h2o model
task <- makeClassifTask(data = dat, target = "Species")
learner <- makeLearner("classif.h2o.deeplearning", predict.type = "prob",
par.vals = list(reproducible = TRUE,
seed = 1))
Mod <- train(learner, task)
# Test predictions
pred <- predict(Mod, newdata = datTest)
# Evaluate performance accuracy & area under curve
performance(pred, measures = list(acc, auc))
The result of the above performance check is:
acc auc
0.7111111 0.7813765
Now, if I tune just one of the parameters (e.g., epochs):
set.seed(1234)
# Tune epoch parameter
param_set <- makeParamSet(
makeNumericParam("epochs", lower = 1, upper = 10))
rdesc <- makeResampleDesc("CV", iters = 3L, predict = "both")
ctrl <- makeTuneControlRandom(maxit = 3)
res <- tuneParams(
learner = learner, task = task, resampling = rdesc, measures = list(auc, acc),
par.set = param_set, control = ctrl
)
the result I get from tuning epochs is:
Tune result:
Op. pars: epochs=1.95
auc.test.mean=0.8526496,acc.test.mean=0.7466667
Now, if I plug that value for the epochs into the learner and run the model again and check the performance:
set.seed(1234)
# plugging the tuned value into model and checking performance again:
learner <- makeLearner("classif.h2o.deeplearning", predict.type = "prob",
par.vals = list(epochs = 1.95,
reproducible = TRUE,
seed = 1))
Mod <- train(learner, task)
# Test predictions
pred1 <- predict(Mod, newdata = datTest)
# Evaluate performance accuracy & area under curve
performance(pred1, measures = list(acc, auc))
The resulting accuracy and AUC I get is now:
acc auc
0.6666667 0.8036437
My question is, why is there such a difference between the accuracy and AUC of the results of using tuneParams
and when I plug the tuned values into the learner?
Or am I using or interpreting tuneParams
incorrectly?
Upvotes: 0
Views: 209
Reputation: 109242
You're getting different results because you're evaluating the learner using different train and test data. If I use the same 3-fold CV, I get the same results:
set.seed(1234)
resample(learner, task, cv3, list(auc, acc))
Aggr perf: auc.test.mean=0.8526496,acc.test.mean=0.7466667
In general, every computed performance is only an estimate of the true generalization performance. This will vary depending on what method of resampling you choose and what data.
Upvotes: 2