yPennylane
yPennylane

Reputation: 772

R caret: Combine rfe() and train()

I want to combine recursive feature elimination with rfe() and tuning together with model selection with trainControl() using the method rf (random forest). Instead of the standard summary statistic I would like to have the MAPE (mean absolute percentage error). Therefore I tried the following code using the ChickWeight data set:

library(caret)
library(randomForest)
library(MLmetrics)

# Compute MAPE instead of other metrics
mape <- function(data, lev = NULL, model = NULL){
  mape <- MAPE(y_pred = data$pred, y_true = data$obs)
  c(MAPE = mape)
}

# specify trainControl
trc <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid", savePred =T,
                    summaryFunction = mape)
# set up grid
tunegrid <- expand.grid(.mtry=c(1:3))

# specify rfeControl
rfec <- rfeControl(functions=rfFuncs, method="cv", number=10, saveDetails = TRUE)

set.seed(42)
results <- rfe(weight ~ Time + Chick + Diet, 
           sizes=c(1:3), # number of predictors from which should algorithm chose the best predictor
           data = ChickWeight, 
           method="rf",
           ntree = 250, 
           metric= "RMSE", 
           tuneGrid=tunegrid,
           rfeControl=rfec,
           trControl = trc)

The code runs without errors. But where do I find the MAPE, which I defined as a summaryFunction in trainControl? Is trainControlexecuted or ignored?

How could I rewrite the code in order to do recursive feature elimination with rfe and then tune the hyperparameter mtry using trainControl within rfe and at the same time compute an additional error measure (MAPE)?

Upvotes: 2

Views: 1164

Answers (1)

Julius Vainora
Julius Vainora

Reputation: 48251

trainControl is ignored, as its description

Control the computational nuances of the train function

would suggest. To use MAPE, you want

rfec$functions$summary <- mape

Then

rfe(weight ~ Time + Chick + Diet, 
    sizes = c(1:3),
    data = ChickWeight, 
    method ="rf",
    ntree = 250, 
    metric = "MAPE", # Modified
    maximize = FALSE, # Modified
    rfeControl = rfec)
#
# Recursive feature selection
#
# Outer resampling method: Cross-Validated (10 fold) 
#
# Resampling performance over subset size:
#
#  Variables   MAPE  MAPESD Selected
#          1 0.1903 0.03190         
#          2 0.1029 0.01727        *
#          3 0.1326 0.02136         
#         53 0.1303 0.02041         
#
# The top 2 variables (out of 2):
#    Time, Chick.L

Upvotes: 1

Related Questions