predict.train vs predict using recipe objects

Question

After specifiying a recipe to use in caret::train I am trying to predict new samples. I have a couple of questions around this as I can not find in caret/recipes documentation.

Should I use predict() or predict.train()? Whats the difference?
Should I bake the test data with the prepared recipe first before using predict? When using preProcess directly in train() you are advised not to preProcess new data as the train object will automatically do that. Is this the same when using recipes?

Below is a reproducible example illustrating my process and the difference in predictions when using predict vs predict.train

library(recipes)
library(caret)
# Data ----
data("credit_data")

credit_train <- credit_data[1:3500,]
credit_test <- credit_data[-(1:3500),]

# Set up recipe ----

set.seed(0)
Rec.Obj = recipe(Status ~ ., data = credit_train) %>%
    step_knnimpute(all_predictors()) %>% 
    step_center(all_numeric())%>%
    step_scale(all_numeric())

# Control parameters ----
set.seed(0)
TC = trainControl("cv",number = 10, savePredictions = "final", classProbs = TRUE, returnResamp = "final")


set.seed(0)
Model.Output = train(Rec.Obj,
                     credit_train,
                     trControl = TC,
                     tuneLength = 1,
                     metric = "Accuracy",
                     method = "glm")

# Preped recipe ----
set.seed(0)
prep.rec <- 
    prep(Rec.Obj, newdata = credit_train)

# Baked data for observation ----
set.seed(0)
bake.train <- bake(prep.rec, new_data = credit_train)
bake.test <- bake(prep.rec, new_data = credit_test)

# investigation of prediction methods ----

# no application of recipe to newdata
set.seed(0)
predict.norm = predict(Model.Output, credit_test, type = "raw")
predict.train = predict.train(Model.Output, credit_test,  type = "raw")

identical(predict.norm,predict.train)
# evaluates to FALSE

# Apply recipe to new data (bake.test)
predict.norm.baked = predict(Model.Output, bake.test, type = "raw")
predict.train.baked = predict.train(Model.Output, bake.test, type = "raw")

identical(predict.norm.baked, predict.train.baked)
# evaluates to FALSE

# Comparison of both predict() funcs
identical(predict.norm, predict.norm.baked)
# evaluates to FALSE

topepo · Accepted Answer

The recipe is embedded into the train object. The answers are different for two reasons:

Since you are giving the recipe (inside of Model.Output) the processed data to be re-processed. You should not give predict() baked data; just use predict() and give it the original test set..
Let S3 do its thing: predict.train is for the x/y interface and predict.train.recipe is for the recipe interface. Just using predict() will do the appropriate thing.

predict.train vs predict using recipe objects

Answers (1)

Related Questions