R mlr - Creating learning curve from subset of training data and whole test data (not whole training data)?

Question

let's say I'm creating such learning curve (possible little errors in code, it's just a sample). What I want is rather a classical learning curve, where you make enlarge the training set keeping the validation/test set the same size.

learningCurve <- generateLearningCurveData("regr.glmnet",
                                           bh.task,
                                           makeResampleDesc(method = "cv", iters = 5, predict = "both"),
                                           seq(0.1, 1, by = 0.1),
                                           list(setAggregation(auc, train.mean), setAggregation(auc, test.mean))
)

The problem with the code above is that the learners are indeed trained on the fraction of training data, but the auc.train.mean measure is evaluated on the whole training set. This results in not really the learning curve I want. I would like this measure to evaluate the performance on the fraction of the training set that was used for learning, like here:

http://www.astroml.org/sklearn_tutorial/practical.html#learning-curves

I believe this sentence explains it all:

Note that when we train on a small subset of the training data, the training error is computed using this subset, not the full training set.

How to achieve this?

Lars Kotthoff · Accepted Answer

The fix for this issue is in this pull request, which should be merged soon.

With the fix in place, I get the following learning curve for the full example in the comments:

R mlr - Creating learning curve from subset of training data and whole test data (not whole training data)?

Answers (2)

Related Questions