Simon Hviid Del Pin
Simon Hviid Del Pin

Reputation: 103

R: Using MLR (or caret or....) to tune parameters for XGBoost

Having walked through several tutorials, I have managed to make a script that successfully uses XGBoost to predict categorial prices on the Boston housing dataset.

However, I cannot successfully tune the parameters of the model using CV. Even after trying several solutions from tutorials and postings here on stackowerflow.

My best outcome so far is very 'hacky' and only tunes a single parameter:

steps <- seq(75,90,5)/100
    for(i in steps){
    .....
    }

But I see all of these fancy setups, that run through several parameters automatically using MLR or Caret or NMOF. However, I haven't gotten close to getting anyone to work on these data. I suspect that it is because most are set up for binary classification but even when addressing this as best, I can I have no success. I could provide you with hundreds of line of code that does not work, but I think the easiest is to provide my code as far as it works here and hear you out how you would progress from here rather than getting swamped in my poor code.

Edit: As I have not had any success even running other peoples scripts. Here are some additional details:

 > packageVersion("mlr")
 ‘2.11’

 > packageVersion("xgboost")
 ‘0.6.4.1’

Upvotes: 1

Views: 2809

Answers (1)

GegznaV
GegznaV

Reputation: 5600

At first, update mlr and other required packages. Then consider the quickstart example from the mlr cheatsheet:

library(mlr)
#> Loading required package: ParamHelpers
library(mlbench)
data(Soybean)

set.seed(180715)

soy = createDummyFeatures(Soybean, target = "Class")
tsk = makeClassifTask(data = soy, target = "Class")
ho = makeResampleInstance("Holdout", tsk)
tsk.train = subsetTask(tsk, ho$train.inds[[1]])
tsk.test = subsetTask(tsk, ho$test.inds[[1]])

lrn = makeLearner("classif.xgboost", nrounds=10)
#> Warning in makeParam(id = id, type = "numeric", learner.param = TRUE, lower = lower, : NA used as a default value for learner parameter missing.
#> ParamHelpers uses NA as a special value for dependent parameters.

cv = makeResampleDesc("CV", iters=5)
res = resample(lrn, tsk.train, cv, acc)
#> Resampling: cross-validation
#> Measures:             acc
#> [Resample] iter 1:    0.9010989
#> [Resample] iter 2:    0.9230769
#> [Resample] iter 3:    0.9120879
#> [Resample] iter 4:    0.9230769
#> [Resample] iter 5:    0.9450549
#> 
#> Aggregated Result: acc.test.mean=0.9208791
#> 

# Tune hyperparameters
ps = makeParamSet(makeNumericParam("eta", 0, 1),
                  makeNumericParam("lambda", 0, 200),
                  makeIntegerParam("max_depth", 1, 20)
)
tc = makeTuneControlMBO(budget = 100)
tr = tuneParams(lrn, tsk.train, cv5, acc, ps, tc)
#> [Tune] Started tuning learner classif.xgboost for parameter set:
#>              Type len Def   Constr Req Tunable Trafo
#> eta       numeric   -   -   0 to 1   -    TRUE     -
#> lambda    numeric   -   - 0 to 200   -    TRUE     -
#> max_depth integer   -   -  1 to 20   -    TRUE     -
#> With control class: TuneControlMBO
#> Imputation value: -0
#> [Tune-x] 1: eta=0.529; lambda=194; max_depth=18
#> [Tune-y] 1: acc.test.mean=0.7846154; time: 0.0 min

# /... output truncated .../

#> [Tune-x] 100: eta=0.326; lambda=0.0144; max_depth=19
#> [Tune-y] 100: acc.test.mean=0.9340659; time: 0.0 min
#> [Tune] Result: eta=0.325; lambda=0.00346; max_depth=20 : acc.test.mean=0.9450549

lrn = setHyperPars(lrn, par.vals = tr$x)

# Evaluate performance
mdl = train(lrn, tsk.train)
prd = predict(mdl, tsk.test)

# Final model
mdl = train(lrn, tsk)

More explanations in the cheatsheet (use the .pptx version, if you want to copy not only the code, but also the descriptions).

Upvotes: 4

Related Questions