exAres
exAres

Reputation: 4926

How to predict on a new dataset using caretEnsemble package in R?

I am currently using caretEnsemble package in R for combining multiple models trained in caret. I have got the list of final trained models (say model_list) using caretList function from the same package as follows.

    model_list <- caretList(
    x = input_predictors, 
    y = input_labels, 
    metric = 'Accuracy',
    tuneList = list(
        randomForestModel =   caretModelSpec(method='rf', 
                                             tuneLength=1, 
                                             preProcess=c('BoxCox', 'center', 'scale')), 
        ldaModel = caretModelSpec(method='lda', 
                                  tuneLength=1, 
                                  preProcess=c('BoxCox', 'center', 'scale')),
        logisticRegressionModel =  caretModelSpec(method='glm', 
                                                  tuneLength=1, 
                                                  preProcess=c('BoxCox', 'center', 'scale'))
    ), 
    trControl = myTrainControl
)

The train control object I provided was as follows :

    myTrainControl = trainControl(method = "cv", 
                              number = 10, 
                              index=createResample(training_input_data$retinopathy, 10),
                              savePredictions = TRUE, 
                              classProbs = TRUE, 
                              verboseIter = TRUE, 
                              summaryFunction = twoClassSummary)

Now I am training on those list of models as :

ens <- caretEnsemble(model_list)

Applying summary on ens tells me the selected models (out of model_list), weightage allocated to those selected models, out-of-sample AUC values for each of the selected models, and finally in-sample AUC values for ens.

Now I want to compute the performance of ens on other test-data (to get the idea about out-of-sample performance). How would I achieve it?

I am trying it out as :

ensPredictions <- predict(ens, newdata = test_data)

but it's giving me an error as :

Error in `[.data.frame`(out, , obsLevels, drop = FALSE) : 
  undefined columns selected

Upvotes: 3

Views: 1765

Answers (1)

suresh
suresh

Reputation: 667

The first thing I'd check if the test set has all the features of your training set.

Upvotes: 1

Related Questions