S Front
S Front

Reputation: 353

wrap tidymodels recipe into function

Is it possible to wrap a tidymodel recipe into a function? I've tried the following:

# Data setup
library(tidyverse)
library(tidymodels)

parks <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-06-22/parks.csv')

modeling_df <- parks %>% 
  select(pct_near_park_data, spend_per_resident_data, med_park_size_data) %>% 
  rename(nearness = "pct_near_park_data",
         spending = "spend_per_resident_data",
         acres = "med_park_size_data") %>% 
  mutate(nearness = (parse_number(nearness)/100)) %>% 
  mutate(spending = parse_number(spending))

# Start building models
set.seed(123)
park_split <- initial_split(modeling_df)
park_train <- training(park_split)
park_test <- testing(park_split)

Works well without function:

tree_rec <- recipe(nearness ~., data = park_train)

Problem: wrap recipe into function:

custom_rec <- function(dat, var){
  tree_rec <- recipe(nearness ~ {{var}}, data = dat)
}

custom_rec(park_train, speeding)

Error:

Error during wrapup: No in-line functions should be used here; use steps to define baking actions.
Error: no more error handlers available (recursive errors?); invoking 'abort' restart

Upvotes: 2

Views: 797

Answers (1)

Julia Silge
Julia Silge

Reputation: 11613

The R formula is an extremely useful but weird, weird thing so I don't recommend trying to mess around with it in a situation like you have here.

Instead, try using the update_role() interface for recipes:

library(tidymodels)
library(modeldata)
data(biomass)

# split data
biomass_tr <- biomass[biomass$dataset == "Training",]

my_rec <- function(dat, preds) {
  recipe(dat) %>%
    update_role({{preds}}, new_role = "predictor") %>%
    update_role(HHV, new_role = "outcome") %>%
    update_role(sample, new_role = "id variable") %>%
    update_role(dataset, new_role = "splitting indicator")
}

my_rec(biomass_tr, carbon) %>% prep() %>% summary()
#> # A tibble: 8 × 4
#>   variable type    role                source  
#>   <chr>    <chr>   <chr>               <chr>   
#> 1 sample   nominal id variable         original
#> 2 dataset  nominal splitting indicator original
#> 3 carbon   numeric predictor           original
#> 4 hydrogen numeric <NA>                original
#> 5 oxygen   numeric <NA>                original
#> 6 nitrogen numeric <NA>                original
#> 7 sulfur   numeric <NA>                original
#> 8 HHV      numeric outcome             original
my_rec(biomass_tr, c(carbon, hydrogen, oxygen, nitrogen)) %>% prep() %>% summary()
#> # A tibble: 8 × 4
#>   variable type    role                source  
#>   <chr>    <chr>   <chr>               <chr>   
#> 1 sample   nominal id variable         original
#> 2 dataset  nominal splitting indicator original
#> 3 carbon   numeric predictor           original
#> 4 hydrogen numeric predictor           original
#> 5 oxygen   numeric predictor           original
#> 6 nitrogen numeric predictor           original
#> 7 sulfur   numeric <NA>                original
#> 8 HHV      numeric outcome             original

Created on 2021-09-21 by the reprex package (v2.0.1)

If you are set on the formula interface, maybe try using rlang::new_formula().

Upvotes: 2

Related Questions