Reputation: 711
In mlr, it is possible to do filter feature selection together with hyperparameter tuning using nested cross validation, e.g. with the following code.
lrn = makeFilterWrapper(learner = "regr.kknn", fw.method = "chi.squared")
ps = makeParamSet(makeDiscreteParam("fw.abs", values = 10:13),
makeDiscreteParam("k", values = c(2, 3, 4)))
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("CV", iter = 2)
outer = makeResampleDesc("Subsample", iter = 3)
lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)
res = resample(lrn, bh.task, outer, mse, extract = getTuneResult)
But as far as I know, it is not possible to do something like this using wrapper feature selection, e.g.:
lrn = makeFeatSelWrapper(learner = "regr.kknn", ww.method = "random") # imaginary code
ps = makeParamSet(makeDiscreteParam("maxit", 15),
makeDiscreteParam("k", values = c(2, 3, 4))) # imaginary code, no method parameter & no resampling provided
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("CV", iter = 2)
outer = makeResampleDesc("Subsample", iter = 3)
lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)
res = resample(lrn, bh.task, outer, mse, extract = getTuneResult)
Is there a way to achieve something like this? Especially, in order to avoid nested-nested cross validation? Is there any methodological reason why this would not be appropriate? Because actually, using filter feature selection with tuning parameter (number of features) looks quite similar to the wrapper approach, that is, your additional hyperparameter is actually a certain set of features, either derived from filter (e.g. "chi-squared") + threshold (top 90%, 80%, 70%) or an output from the wrapper algorithm (random, GA, Exhaustive, Sequential), and the best set of features is based on inner-cv performance in both cases.
I believe both approaches (nested with additional parameters for filtering and nested-nested) is similar with regard to computing complexity, but you might not want to reduce your training dataset further with nested-nested CV, and this would be achievable with the first approach.
Is this a methodological error I am making or is this a lack of (probably not really popular) feature?
Upvotes: 4
Views: 1525
Reputation: 19716
This feature is available in mlr since July. One needs to install the git version
devtools::install_github("mlr-org/mlr")
TuneWrapper
needs to be in the inner resampling loop while FeatSelWrapper
needs to be in the outer resampling loop. Here is an example using iris.task
and rpart with backward selection:
library(mlr)
tuning parameters:
ps <- makeParamSet(
makeNumericParam("cp", lower = 0.01, upper = 0.1),
makeIntegerParam("minsplit", lower = 10, upper = 20)
)
grid search:
ctrl <- makeTuneControlGrid(resolution = 5L)
specify learner:
lrn <- makeLearner("classif.rpart", predict.type = "prob")
generate a tune wrapper:
lrn <- makeTuneWrapper(lrn, resampling = cv3, par.set = ps, control = makeTuneControlGrid(), show.info = FALSE)
generate a feature selection wrapper:
lrn = makeFeatSelWrapper(lrn,
resampling = cv3,
control = makeFeatSelControlSequential(method = "sbs"), show.info = FALSE)
perform resample:
res <- resample(lrn, task = iris.task, resampling = cv3, show.info = TRUE, models = TRUE)
note that even this small example will take some time
res
#output
Resample Result
Task: iris_example
Learner: classif.rpart.tuned.featsel
Aggr perf: mmce.test.mean=0.1000000
Runtime: 92.1436
One can do the same thing without the outer most resample:
lrn <- makeLearner("classif.rpart", predict.type = "prob")
lrn <- makeTuneWrapper(lrn, resampling = cv3, par.set = ps, control = makeTuneControlGrid(), show.info = TRUE)
res2 <- selectFeatures(learner = lrn , task = iris.task, resampling = cv3,
control = makeFeatSelControlSequential(method = "sbs"), show.info = TRUE)
Upvotes: 2
Reputation: 7282
If I got you right you are basically asking how to tune a FeatSelWrapper
? This is a bit complex as Feature Selection (in mlr
) depends on resampling because it is basically tuning. We don't tune learner parameters but we tune the selection of features to optimize a performance measure. To caluclate that measure we need resampling.
So what you propose in other words is to tune the "feature tuning" by choosing the best parameter for the feature tuning algorithm. This naturally brings another layer of nested resampling.
But it is debatable if this is necessary as the choice of Feature Selection usually depends on your available resources and other circumstances.
What you can do is to benchmark different feature selection methods:
inner = makeResampleDesc("CV", iter = 2)
outer = makeResampleDesc("Subsample", iter = 3)
settings = list(random1 = makeFeatSelControlRandom(maxit = 15), random2 = makeFeatSelControlRandom(maxit = 20))
lrns = Map(function(x, xn) {
lrn = makeFeatSelWrapper(learner = "regr.lm", control = x, resampling = inner)
lrn$id = paste0(lrn$id, ".", xn)
lrn
}, x = settings, xn = names(settings))
benchmark(lrns, bh.task, outer, list(mse, timeboth))
Upvotes: 1