Reputation: 2016
I'm new to recipes
and having some issues with the API. Why can't I bake
or juice
my recipe steps when I've removed certain features that I'm not interested in?
set.seed(999)
train_test_split <- initial_split(mtcars)
mtcars_train <- training(train_test_split)
mtcars_test <- testing(train_test_split)
mtcars_train %>%
recipe(mpg ~ cyl + disp + hp + gear) %>%
step_rm(qsec, vs, carb) %>%
step_center(all_numeric()) %>%
step_scale(all_numeric()) %>%
prep(training = mtcars_train)
results in:
Error in .f(.x[[i]], ...) : object 'qsec' not found
Which is pretty annoying because that means that I'll need to remove rows manually on both the test and train sets after steps have been applied:
rec_scale <- mtcars %>%
recipe(mpg ~ cyl + disp + hp + gear) %>%
step_center(all_numeric()) %>%
step_scale(all_numeric()) %>%
prep(training = mtcars_train)
train <- juice(rec_scale) %>%
select(-qsec, -vs, -carb)
test <- bake(rec_scale, mtcars_test) %>%
select(-qsec, -vs, -carb)
Am I thinking about this wrong? I could alternatively filter beforehand, but I would think that my recipe should include things like that.
Upvotes: 3
Views: 2158
Reputation: 14316
You should include all columns used in a recipe steps inside the recipe()
call. They can't be removed if they are not in the recipe.
library(tidymodels)
#> ── Attaching packages ────────────────────────────── tidymodels 0.0.2 ──
#> ✔ broom 0.5.2 ✔ purrr 0.3.2
#> ✔ dials 0.0.2 ✔ recipes 0.1.6
#> ✔ dplyr 0.8.3 ✔ rsample 0.0.5
#> ✔ ggplot2 3.2.0 ✔ tibble 2.1.3
#> ✔ infer 0.4.0.1 ✔ yardstick 0.0.3
#> ✔ parsnip 0.0.3
#> ── Conflicts ───────────────────────────────── tidymodels_conflicts() ──
#> ✖ purrr::discard() masks scales::discard()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ recipes::step() masks stats::step()
set.seed(999)
train_test_split <- initial_split(mtcars)
mtcars_train <- training(train_test_split)
mtcars_test <- testing(train_test_split)
rec <-
mtcars_train %>%
recipe(mpg ~ cyl + disp + hp + gear) %>%
step_center(all_numeric()) %>%
step_scale(all_numeric()) %>%
prep(training = mtcars_train)
summary(rec)
#> # A tibble: 5 x 4
#> variable type role source
#> <chr> <chr> <chr> <chr>
#> 1 cyl numeric predictor original
#> 2 disp numeric predictor original
#> 3 hp numeric predictor original
#> 4 gear numeric predictor original
#> 5 mpg numeric outcome original
Created on 2019-08-04 by the reprex package (v0.2.1)
Upvotes: 3