Reputation: 705
TL/DR ANSWER: specify training data in newdata
argument.
How do I consistently extract class probabilities from trained models with caret
's predict
? Currently I get an error when the argument to predict
was trained with the formula notation and a variable was indicated to be ignored with -variable
.
This can be reproduced with:
fit.lda <- train(Species ~ . -Petal.Length,
data = iris,
preProcess = c("center", "scale"),
trControl = trainControl(method = "repeatedcv",
number = 10,
repeats = 3,
classProbs = TRUE,
savePredictions = "final",
selectionFunction = "best",
summaryFunction = multiClassSummary),
method = "lda",
metric = "Mean_F1")
and then the following line will fail:
predict(fit.lda, type = "prob")
Error in predict.lda(modelFit, newdata) : wrong number of variables
If the -Petal.Length
is omitted in the train
formula, there is no error. Am I doing something wrong with the formula statement?
I suppose I could dig into the model's Is there a way to get pred
slot and grab the columns corresponding to the class types (see EDIT2), but this seems hackish.predict
to work as expected?
=====EDIT=====
I trained a number of different models (using formula notation) with caretList
from the caretEnsemble
package, and I got various errors when trying to use predict
:
knn
Error in knn3Train(train = c(....) : dims of 'test' and 'train differ
svmRadial
:Warning message: In method$prob(modelFit = modelFit, newdata = newdata, submodels = param) : kernlab class probability calculations failed; returning NAs
mlpML
:Error in myFunc[[1]](x, ...) : number of input data columns 28 does not match number of input neurons 20
Methods that worked without errors were nnet
and tree based methods (rf
, xgbTree
)
=====EDIT2=====
The following doesn't take repeated resampling into account. The selected answer is much simpler.
Here's a self-fashioned solution for extracting probabilities from the trained model, but for standardization, I'd prefer if it's possible to get predict
to behave.
grabProbs <- function(model) model$pred[, colnames(model$pred) %in% model$levels]
grabProbs(fit.lda)
Upvotes: 0
Views: 1798
Reputation: 23109
Just use the newdata
parameter and it will work
predict(fit.lda, newdata = iris, type = "prob")
[EDITED]
As we can see, for lda
the prediction result is identical:
library(MASS)
fit.lda <- lda(Species ~ . -Petal.Length, data = iris)
identical(predict(fit.lda), predict(fit.lda, newdata=iris))
# [1] TRUE
library(randomForest)
fit.rf <- randomForest(Species ~ . -Petal.Length, data = iris)
identical(predict(fit.rf), predict(fit.rf, newdata=iris))
# [1] FALSE
Upvotes: 0