Stefano Potter
Stefano Potter

Reputation: 3577

Error creating ensemble model with caretEnsemble

I want to train some models using caret, and then I want to compare caretEnsemble to the individual models after I have tuned the paramaters. I am testing this on the boston housing dataset. My code so far is:

library(caret)
library(ranger)
library(randomForest)
library(caretEnsemble)
library(xgboost)
library(mlbench)
library(e1071)
library(GAMBoost)
library(quantregForest)
library(glmnet)

#load in boston housing dataset
data(BostonHousing)

df <- data.frame(BostonHousing)

#set random seed for reproduction
set.seed(54321)

#break into train and test
indexes <- createDataPartition(df$medv, times = 1, p = 0.7, list = FALSE)

train <- df[indexes,]
test <- df[-indexes,]

#set train control
my_control <- trainControl(method = "repeatedcv",
                             number = 10,
                             repeats = 3,
                             savePredictions = 'final',
                             allowParallel = T,
                             index = createResample(train$medv, 25))

#create the model list, tuneLength here should get tuning paramaters
model_list <- caretList(
  medv~., data=train,
  trControl=my_control,
  metric="RMSE",
  methodList=c("glm"),
  tuneList=list(
    ranger=caretModelSpec(method = 'ranger', tuneLength = 2),
    rf=caretModelSpec(method = 'rf', tuneLength = 2),
    quantile=caretModelSpec(method = 'qrf', tuneLength = 2),
    ridge=caretModelSpec(method = 'ridge', tuneLength = 2),
    bam=caretModelSpec(method = 'gamboost', tuneLength = 2),
    svm=caretModelSpec(method = 'svmPoly', tuneLength = 2)
  )
)

At this point I get the error:

Something is wrong; all the RMSE metric values are missing:
      RMSE        Rsquared        MAE     
 Min.   : NA   Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA   Max.   : NA  
 NA's   :2     NA's   :2     NA's   :2    
Error: Stopping
In addition: There were 50 or more warnings (use warnings() to see the first 50)

And I don't understand why. The rest of what I am trying to do, if things worked, are as follows:

#set new seed
set.seed(101)

#set new train control
trainControl = trainControl(method="repeatedcv",
                            number=10,
                            repeats=3,
                            savePredictions='final',
                            allowParallel = T,
                            index = createResample(train$medv, 25))

#ensemble the models
greedy_ensemble <- caretEnsemble(
  model_list,
  metric="RMSE",
  trControl=trainControl)

summary(greedy_ensemble)


#predict on test data
stack_predicteds <- predict(greedy_ensemble, newdata=test)
head(stack_predicteds)

Upvotes: 1

Views: 304

Answers (1)

NeuroNaut
NeuroNaut

Reputation: 78

Just in case, someone struggles with the gamboost package. Seems it no longer called gamboost but mboost (which has a function gamboost). Apart from that, your code runs smoothly, if you uncomment the gamboost line. I switched on verboseIter, and it seems, when caret aggregates the results there is an issue with the results from gamboost. Maybe an older version of gamboost might mitigate the problem, but at least now you know, its an issue with this function.

    my_control <- trainControl(method = "repeatedcv",
                       number = 10,
                       repeats = 3,
                       savePredictions = 'final',
                       allowParallel = T,
                       verboseIter = T,
                       index = createResample(train$medv, 25)
                       # summaryFunction = defaultSummary
                       )

#create the model list, tuneLength here should get tuning paramaters
model_list <- caretList(
 medv~., data=train,
 trControl=my_control,
 metric="RMSE",
 methodList=c("glm"),
tuneList=list(
 ranger=caretModelSpec(method = 'ranger', tuneLength = 2),
 rf=caretModelSpec(method = 'rf', tuneLength = 2),
 quantile=caretModelSpec(method = 'qrf', tuneLength = 2),
 ridge=caretModelSpec(method = 'ridge', tuneLength = 2),
 #bam=caretModelSpec(method = 'gamboost', tuneLength = 2), # thats the issue here!
 svm=caretModelSpec(method = 'svmPoly', tuneLength = 2)
 ))

I checked the resample results, and they work:

results <- resamples(model_list)
summary(results)

Call: summary.resamples(object = results)

Models: ranger, rf, quantile, ridge, svm, glm Number of resamples: 25

MAE Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ranger 2.108154 2.266169 2.439825 2.458017 2.588916 3.131053 0 rf 2.139905 2.368468 2.554799 2.608758 2.762459 3.530363 0 quantile 2.057143 2.300368 2.523729 2.595198 2.841045 3.497674 0 ridge 3.139259 3.553962 3.734990 3.768796 3.902688 4.507217 0 svm 2.706577 2.976414 3.116446 3.185859 3.347149 4.191346 0 glm 3.139259 3.553962 3.734990 3.768796 3.902688 4.507217 0

RMSE Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ranger 2.979082 3.458703 3.855317 3.913947 4.370950 5.155696 0 rf 3.013475 3.427534 3.789188 4.012114 4.459492 5.839193 0 quantile 2.943785 3.246320 3.757453 4.075292 4.821078 5.933008 0 ridge 4.425996 4.964852 5.373967 5.348546 5.719916 6.490633 0 svm 3.695856 4.761753 5.038852 5.192791 5.784132 6.872716 0 glm 4.425996 4.964852 5.373967 5.348546 5.719916 6.490633 0

Rsquared Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ranger 0.7471052 0.7881198 0.8481290 0.8366974 0.8777298 0.8976639 0 rf 0.6765381 0.7587662 0.8527462 0.8234661 0.8732058 0.8860521 0 quantile 0.6397501 0.7454151 0.8632883 0.8173972 0.8732256 0.8955925 0 ridge 0.6042317 0.6609395 0.6840746 0.6955365 0.7320912 0.7768662 0 svm 0.6223899 0.6727539 0.7243262 0.7306720 0.7856609 0.8595945 0 glm 0.6042317 0.6609395 0.6840746 0.6955365 0.7320912 0.7768662 0

Upvotes: 1

Related Questions