Reputation: 47
I'm trying to write a function that fits resamples off a recipe that uses step_ns(). For some reason I am getting the error messages:
Fold01: recipe: Error: Not all variables in the recipe are present in the supplied training set
and so on for all the folds. And then
Warning message:
All models failed in [fit_resamples()]. See the
.notes
column.
This is my code:
compare_basis_exp_to_base_mod <- function (data, outcome, metric, ...) {
outcome <- rlang::enquo(outcome)
metric <- rlang::enquo(metric)
pred_list <- colnames(data)
outcome_str <- substring(deparse(substitute(outcome)), 2)
outcome_str_id <- which(colnames(data) %in% outcome_str)
predictor <- pred_list[-outcome_str_id]
data <- data %>%
rename(prediction = !!outcome)
res <- tibble(splits = list(), id = character(), .metrics = list(),
.notes = list(), .predictions = list(), pred = character())
rec_without_splines <- recipe(prediction ~ ., data = data) %>%
prep()
rec_with_splines <- recipe(prediction ~ ., data = data) %>%
step_ns(all_predictors(), ...) %>%
prep()
folds_without_splines <- vfold_cv(juice(rec_without_splines), strata = prediction)
folds_with_splines <- vfold_cv(juice(rec_with_splines), strata = prediction)
mod <- linear_reg() %>%
set_engine("lm")
mod_without_splines <- fit_resamples(mod,
rec_without_splines,
folds_without_splines,
metrics = metric_set(!!metric),
control = control_resamples(save_pred = TRUE)) %>%
mutate(pred = "no_splines")
mod_with_splines <- fit_resamples(mod,
rec_with_splines,
folds_with_splines,
metrics = metric_set(!!metric),
control = control_resamples(save_pred = TRUE)) %>%
mutate(pred = "with_splines")
res <- mod_without_splines %>%
bind_rows(mod_with_splines)
return (res)
}
Basically the argument data
takes in a two column table and outcome
is the name of the outcome column. Aside from the use of this function (I'm just playing around with tidymodels here as I'm new to it) I just want to understand what's causing this error and how to fix it. The error comes when evaluating mod_with_splines
.
A similar problem was encountered here. But I don't know if it relates to my problem. I can't not prep the recipe before passing it to fit_resamples
. (Or so I think)
Any help would be appreciated. Thanks.
Upvotes: 1
Views: 511
Reputation: 3185
Your issue comes from trying to apply a recipe on a dataset that has already been run through that same recipe.
If we assume that the predictor variables were X1
and X2
, then rec_with_splines
is expected those variables. But since folds_with_splines
contains the juiced results of rec_with_splines
then folds_with_splines
actually contains X1_ns_1
, X1_ns_2
, X2_ns_1
, and X2_ns_2
. Not X1
and X2
.
I would suggest using workflows to combine the preprocessing and modeling step. And to pass the raw data into vfold_cv()
.
library(tidymodels)
compare_basis_exp_to_base_mod <- function (data, outcome, metric, ...) {
outcome <- rlang::enquo(outcome)
metric <- rlang::enquo(metric)
pred_list <- colnames(data)
outcome_str <- substring(deparse(substitute(outcome)), 2)
outcome_str_id <- which(colnames(data) %in% outcome_str)
predictor <- pred_list[-outcome_str_id]
data <- data %>%
rename(prediction = !!outcome)
rec_without_splines <- recipe(prediction ~ ., data = data) %>%
prep()
rec_with_splines <- recipe(prediction ~ ., data = data) %>%
step_ns(all_predictors(), ...)
mod <- linear_reg() %>%
set_engine("lm")
wf_without_splines <- workflow() %>%
add_recipe(rec_without_splines) %>%
add_model(mod)
wf_with_splines <- workflow() %>%
add_recipe(rec_with_splines) %>%
add_model(mod)
data_folds <- vfold_cv(data, strata = prediction)
mod_without_splines <- fit_resamples(wf_without_splines,
data_folds,
metrics = metric_set(!!metric),
control = control_resamples(save_pred = TRUE)) %>%
mutate(pred = "no_splines")
mod_with_splines <- fit_resamples(wf_with_splines,
data_folds,
metrics = metric_set(!!metric),
control = control_resamples(save_pred = TRUE)) %>%
mutate(pred = "with_splines")
res <- mod_without_splines %>%
bind_rows(mod_with_splines)
return (res)
}
res <- compare_basis_exp_to_base_mod(mtcars, mpg, rmse)
Upvotes: 2