Reputation: 361
I am creating a recipe so that I first create a calculated column called "response" as so:
rec <- recipe( ~., data = training) %>%
step_mutate(response = as.integer(all(c('A', 'B') %in% Col4) & Col4 == 'A'))
I would like to now specify this new calculated column as the response variable in the recipe() function as shown below. I will be doing a series of operations on it such as this first one with step_naomit. How do I re-specify my response in recipe() to be the calculated column from my previous step (above) using recipes?
recipe <- recipe(response ~ ., data = training) %>%
step_naomit(recipe, response)
Upvotes: 1
Views: 330
Reputation: 3185
This is related to tidymodel error, when calling predict function is asking for target variable
It is generally not advisable to modify the response inside your recipe. This is because the response variable won't be available to the recipe in certain cases, such as when using {tune}. I would recommend that you perform this transformation before you pass the data to the recipe. Even better if you do it before the validation split.
set.seed(1234)
data_split <- my_data %>%
step_mutate(response = as.integer(all(c('A', 'B') %in% Col4) & Col4 == 'A')) %>%
initial_split()
training <- training(data_split)
testing <- testing(data_split)
rec <- recipe(response ~., data = training)
Upvotes: 3
Reputation: 206232
You can set the role for new columns in the step_mutate()
function by explictly setting the role=
parmaeter.
rec <- recipe( ~., data = iris) %>%
step_mutate(SepalSquared= Sepal.Length ^ 2, role="outcome")
Then check that it worked with summary(prep(rec))
variable type role source
<chr> <chr> <chr> <chr>
1 Sepal.Length numeric predictor original
2 Sepal.Width numeric predictor original
3 Petal.Length numeric predictor original
4 Petal.Width numeric predictor original
5 Species nominal predictor original
6 SepalSquared numeric outcome derived
Upvotes: 3