Matek
Matek

Reputation: 711

R mlr - Creating learning curve from subset of training data and whole test data (not whole training data)?

let's say I'm creating such learning curve (possible little errors in code, it's just a sample). What I want is rather a classical learning curve, where you make enlarge the training set keeping the validation/test set the same size.

learningCurve <- generateLearningCurveData("regr.glmnet",
                                           bh.task,
                                           makeResampleDesc(method = "cv", iters = 5, predict = "both"),
                                           seq(0.1, 1, by = 0.1),
                                           list(setAggregation(auc, train.mean), setAggregation(auc, test.mean))
)

The problem with the code above is that the learners are indeed trained on the fraction of training data, but the auc.train.mean measure is evaluated on the whole training set. This results in not really the learning curve I want. I would like this measure to evaluate the performance on the fraction of the training set that was used for learning, like here:

http://www.astroml.org/sklearn_tutorial/practical.html#learning-curves

I believe this sentence explains it all:

Note that when we train on a small subset of the training data, the training error is computed using this subset, not the full training set.

How to achieve this?

Upvotes: 2

Views: 822

Answers (2)

Lars Kotthoff
Lars Kotthoff

Reputation: 109242

The fix for this issue is in this pull request, which should be merged soon.

With the fix in place, I get the following learning curve for the full example in the comments:

enter image description here

Upvotes: 1

Matek
Matek

Reputation: 711

As a reference for future readers, this will be fixed and here's the github issue

https://github.com/mlr-org/mlr/issues/1357

Upvotes: 0

Related Questions