How to set the splitting rule in decision_tree spec?

Question

When creating a specficication and fitting a decision tree with tidymodels metapackage and decision_tree() function, the default splitting method/rule in rpart package for categorical data is the Gini index, which is set with the params argument of rpart::rpart().

Also, creating a random forest model with ranger engine uses the same default for categorical data. My question is: How can I change the splitting method to information gain or shannon entropy?

Here is an example (focus on str() calls and the formas_forest_fit object to see the splitrules)

# install.packages(c("tidymodels", "rpart", "ranger"))
library(tidymodels)

formas <- tibble(
  Color = c("Rojo", "Azul", "Rojo", "Verde", "Rojo", "Verde"), 
  Forma = c("Cuadrado", "Cuadrado", "Redondo", "Cuadrado", "Redondo", "Cuadrado"), 
  `Tamaño` = c("Grande", "Grande", "Pequeño", "Pequeño", "Grande", "Grande"), 
  Compra = structure(c(2L, 2L, 1L, 1L, 2L, 1L), .Label = c("No", "Si"), class = "factor")
)

# Tree spec and fit -----------------------
formas_tree_spec <- 
  decision_tree(min_n = 2) %>% 
  set_mode("classification") %>% 
  set_engine("rpart")

formas_tree_fit <- 
  fit(
    formas_tree_spec, 
    data = formas, 
    formula = Compra ~ .
  )

# Forest spec and fit ----------------------
formas_forest_spec <- 
  rand_forest(trees = 5000, min_n = 2) %>% 
  set_mode("classification") %>% 
  set_engine("ranger") 

formas_forest_fit <- 
  fit(
    formas_forest_spec, 
    data = formas, 
    formula = Compra ~ .
  )

str(rpart::rpart)
str(ranger::ranger)
formas_forest_fit

dzegpi · Accepted Answer

Following Emil Hvidfeldt's suggestion, the set_engine() function accepts us to pass arguments directly to the engine function.

This is the tree with information gain splitting rule:

formas_tree_spec <- 
  decision_tree(min_n = 2) %>% 
  set_mode("classification") %>% 
  set_engine("rpart", parms = list(split = "information")

How to set the splitting rule in decision_tree spec?

Answers (1)

Related Questions