Reputation: 353
Is it possible to wrap a tidymodel
recipe into a function? I've tried the following:
# Data setup
library(tidyverse)
library(tidymodels)
parks <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-06-22/parks.csv')
modeling_df <- parks %>%
select(pct_near_park_data, spend_per_resident_data, med_park_size_data) %>%
rename(nearness = "pct_near_park_data",
spending = "spend_per_resident_data",
acres = "med_park_size_data") %>%
mutate(nearness = (parse_number(nearness)/100)) %>%
mutate(spending = parse_number(spending))
# Start building models
set.seed(123)
park_split <- initial_split(modeling_df)
park_train <- training(park_split)
park_test <- testing(park_split)
Works well without function:
tree_rec <- recipe(nearness ~., data = park_train)
Problem: wrap recipe into function:
custom_rec <- function(dat, var){
tree_rec <- recipe(nearness ~ {{var}}, data = dat)
}
custom_rec(park_train, speeding)
Error:
Error during wrapup: No in-line functions should be used here; use steps to define baking actions.
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
Upvotes: 2
Views: 797
Reputation: 11613
The R formula is an extremely useful but weird, weird thing so I don't recommend trying to mess around with it in a situation like you have here.
Instead, try using the update_role()
interface for recipes:
library(tidymodels)
library(modeldata)
data(biomass)
# split data
biomass_tr <- biomass[biomass$dataset == "Training",]
my_rec <- function(dat, preds) {
recipe(dat) %>%
update_role({{preds}}, new_role = "predictor") %>%
update_role(HHV, new_role = "outcome") %>%
update_role(sample, new_role = "id variable") %>%
update_role(dataset, new_role = "splitting indicator")
}
my_rec(biomass_tr, carbon) %>% prep() %>% summary()
#> # A tibble: 8 × 4
#> variable type role source
#> <chr> <chr> <chr> <chr>
#> 1 sample nominal id variable original
#> 2 dataset nominal splitting indicator original
#> 3 carbon numeric predictor original
#> 4 hydrogen numeric <NA> original
#> 5 oxygen numeric <NA> original
#> 6 nitrogen numeric <NA> original
#> 7 sulfur numeric <NA> original
#> 8 HHV numeric outcome original
my_rec(biomass_tr, c(carbon, hydrogen, oxygen, nitrogen)) %>% prep() %>% summary()
#> # A tibble: 8 × 4
#> variable type role source
#> <chr> <chr> <chr> <chr>
#> 1 sample nominal id variable original
#> 2 dataset nominal splitting indicator original
#> 3 carbon numeric predictor original
#> 4 hydrogen numeric predictor original
#> 5 oxygen numeric predictor original
#> 6 nitrogen numeric predictor original
#> 7 sulfur numeric <NA> original
#> 8 HHV numeric outcome original
Created on 2021-09-21 by the reprex package (v2.0.1)
If you are set on the formula interface, maybe try using rlang::new_formula()
.
Upvotes: 2