Reputation: 597
After specifiying a recipe to use in caret::train I am trying to predict new samples. I have a couple of questions around this as I can not find in caret/recipes documentation.
Below is a reproducible example illustrating my process and the difference in predictions when using predict vs predict.train
library(recipes)
library(caret)
# Data ----
data("credit_data")
credit_train <- credit_data[1:3500,]
credit_test <- credit_data[-(1:3500),]
# Set up recipe ----
set.seed(0)
Rec.Obj = recipe(Status ~ ., data = credit_train) %>%
step_knnimpute(all_predictors()) %>%
step_center(all_numeric())%>%
step_scale(all_numeric())
# Control parameters ----
set.seed(0)
TC = trainControl("cv",number = 10, savePredictions = "final", classProbs = TRUE, returnResamp = "final")
set.seed(0)
Model.Output = train(Rec.Obj,
credit_train,
trControl = TC,
tuneLength = 1,
metric = "Accuracy",
method = "glm")
# Preped recipe ----
set.seed(0)
prep.rec <-
prep(Rec.Obj, newdata = credit_train)
# Baked data for observation ----
set.seed(0)
bake.train <- bake(prep.rec, new_data = credit_train)
bake.test <- bake(prep.rec, new_data = credit_test)
# investigation of prediction methods ----
# no application of recipe to newdata
set.seed(0)
predict.norm = predict(Model.Output, credit_test, type = "raw")
predict.train = predict.train(Model.Output, credit_test, type = "raw")
identical(predict.norm,predict.train)
# evaluates to FALSE
# Apply recipe to new data (bake.test)
predict.norm.baked = predict(Model.Output, bake.test, type = "raw")
predict.train.baked = predict.train(Model.Output, bake.test, type = "raw")
identical(predict.norm.baked, predict.train.baked)
# evaluates to FALSE
# Comparison of both predict() funcs
identical(predict.norm, predict.norm.baked)
# evaluates to FALSE
Upvotes: 4
Views: 804
Reputation: 14331
The recipe is embedded into the train
object. The answers are different for two reasons:
Since you are giving the recipe (inside of Model.Output
) the processed data to be re-processed. You should not give predict()
baked data; just use predict()
and give it the original test set..
Let S3 do its thing: predict.train
is for the x/y interface and predict.train.recipe
is for the recipe interface. Just using predict()
will do the appropriate thing.
Upvotes: 2