Reputation: 711
let's say I'm creating such learning curve (possible little errors in code, it's just a sample). What I want is rather a classical learning curve, where you make enlarge the training set keeping the validation/test set the same size.
learningCurve <- generateLearningCurveData("regr.glmnet",
bh.task,
makeResampleDesc(method = "cv", iters = 5, predict = "both"),
seq(0.1, 1, by = 0.1),
list(setAggregation(auc, train.mean), setAggregation(auc, test.mean))
)
The problem with the code above is that the learners are indeed trained on the fraction of training data, but the auc.train.mean
measure is evaluated on the whole training set. This results in not really the learning curve I want. I would like this measure to evaluate the performance on the fraction of the training set that was used for learning, like here:
http://www.astroml.org/sklearn_tutorial/practical.html#learning-curves
I believe this sentence explains it all:
Note that when we train on a small subset of the training data, the training error is computed using this subset, not the full training set.
How to achieve this?
Upvotes: 2
Views: 822
Reputation: 109242
The fix for this issue is in this pull request, which should be merged soon.
With the fix in place, I get the following learning curve for the full example in the comments:
Upvotes: 1
Reputation: 711
As a reference for future readers, this will be fixed and here's the github issue
https://github.com/mlr-org/mlr/issues/1357
Upvotes: 0