Reputation: 4686
I am trying to set up an mlr
classification task where 75% of the data is to be used for training, and this 75% will be resampled by repeated cross validation.
My setup of the task is as follows
pred.Bin.Task <- makeClassifTask(id="CountyCrime", data=df, target="count.bins")
preProc.Task <- normalizeFeatures(pred.Bin.Task, method="range")
rdesc <- makeResampleDesc("RepCV", reps=3, folds=5)
inTraining <- caret::createDataPartition(df$count.bins, p = .75, list = FALSE)
But I couldn't get the resampling to work. When I do lda.train <- resample("classif.lda", preProc.Task, rdesc, subset=inTraining)
I get the error
Error in setHyperPars2.Learner(learner, insert(par.vals, args)) :
classif.lda: Setting parameter subset without available description object!
You can switch off this check by using configureMlr!
Training without subsetting (i.e. lda.train <- resample("classif.lda", preProc.Task, rdesc)
) works.
I'd rather have the whole data rather than just the training data in the Task, so that when I do prediction with the holdout data I don't need to pre-process and resubmit new data. Any suggestions on how I can get the subsetting right?
Upvotes: 1
Views: 527
Reputation: 109242
The cause of the error is that the resample
function doesn't have a subset
argument, so it's passed through to the learner, which does not have such an argument either.
mlr
's resample descriptions don't allow you to keep data completely separate (i.e. not use it at all during training) as you're trying to do. However, you can use the subsetTask
function to partition the data without having to preprocess again:
preproc.task.train = subsetTask(preproc.task, inTraining)
resample("classif.lda", preproc.task.train, rdesc)
Upvotes: 3