
Reputation: 705

caret: `predict` fails when `train` formula has deleted variables

TL/DR ANSWER: specify training data in newdata argument.

How do I consistently extract class probabilities from trained models with caret's predict? Currently I get an error when the argument to predict was trained with the formula notation and a variable was indicated to be ignored with -variable.

This can be reproduced with:

fit.lda <- train(Species ~ . -Petal.Length, 
  data = iris, 
  preProcess = c("center", "scale"), 
  trControl = trainControl(method = "repeatedcv", 
    number = 10, 
    repeats = 3, 
    classProbs = TRUE, 
    savePredictions = "final", 
    selectionFunction = "best", 
    summaryFunction = multiClassSummary), 
  method = "lda", 
  metric = "Mean_F1")

and then the following line will fail:

predict(fit.lda, type = "prob")

Error in predict.lda(modelFit, newdata) : wrong number of variables

If the -Petal.Length is omitted in the train formula, there is no error. Am I doing something wrong with the formula statement?

I suppose I could dig into the model's pred slot and grab the columns corresponding to the class types (see EDIT2), but this seems hackish. Is there a way to get predict to work as expected?


I trained a number of different models (using formula notation) with caretList from the caretEnsemble package, and I got various errors when trying to use predict:

Error in knn3Train(train = c(....) : dims of 'test' and 'train differ

Warning message: In method$prob(modelFit = modelFit, newdata = newdata, submodels = param) : kernlab class probability calculations failed; returning NAs

Error in myFunc[[1]](x, ...) : number of input data columns 28 does not match number of input neurons 20

Methods that worked without errors were nnet and tree based methods (rf, xgbTree)


The following doesn't take repeated resampling into account. The selected answer is much simpler.

Here's a self-fashioned solution for extracting probabilities from the trained model, but for standardization, I'd prefer if it's possible to get predict to behave.

grabProbs <- function(model) model$pred[, colnames(model$pred) %in% model$levels]


Upvotes: 0

Views: 1798

Answers (1)

Sandipan Dey
Sandipan Dey

Reputation: 23109

Just use the newdata parameter and it will work

predict(fit.lda, newdata = iris, type = "prob")


As we can see, for lda the prediction result is identical:

fit.lda <- lda(Species ~ . -Petal.Length, data = iris)
identical(predict(fit.lda), predict(fit.lda, newdata=iris))
# [1] TRUE

fit.rf <- randomForest(Species ~ . -Petal.Length, data = iris)
identical(predict(fit.rf), predict(fit.rf, newdata=iris))
# [1] FALSE

Upvotes: 0

Related Questions