for_the_love_of_cod
for_the_love_of_cod

Reputation: 81

How do I extract the classification tree from this parsnip model in R?

I am working through 'Machine Learning & R Expert techniques for predictive modeling' by Brett Lantz. I am using the tidymodels suite as I try the example modeling exercises in R.

I am working through chapter 5 in which you build a decision tree with the C5.0 algorithm. I hav e created the model using the code shown below

 c5_v1 <- C5_rules() %>% 
 set_mode('classification') %>% 
 set_engine('C5.0')
  

c5_res_1 <- fit(object = c5_v1, formula = default ~., data = credit_train)

This has worked successfully:

parsnip model object


Call:
C5.0.default(x = x, y = y, trials = trials, rules = TRUE, control
 = C50::C5.0Control(minCases = minCases, seed = sample.int(10^5, 1), earlyStopping
 = FALSE))

Rule-Based Model
Number of samples: 900 
Number of predictors: 20 

Number of Rules: 22 

Non-standard options: attempt to group attributes

Try as I might, Google as I do, read parsnips documentation, etc., I cannot find out how to view the decision tree. Can anyone tell me how to view the actual tree it has created?

Upvotes: 4

Views: 256

Answers (1)

Desmond
Desmond

Reputation: 1137

Do note C5_rules() is a specification for a rule-fit model. Therefore, after fitting with C5_rules(), you shouldn't expect the output to be a decision tree but a set of rules instead.

With the C5.0 engine, you're able to get both a decision tree output and a rules output. With the fitted model, run extract_fit_engine() to obtain the engine specific fit embedded within a parsnip model fit, followed by summary() to extract the output.

library(tidymodels)
library(rules)
#> 
#> Attaching package: 'rules'
#> The following object is masked from 'package:dials':
#> 
#>     max_rules
data(penguins, package = "modeldata")

#model specification
C5_decision_tree <- decision_tree() |> 
  set_engine("C5.0") |> 
  set_mode("classification")

C5_rules <- C5_rules() |> 
  #no need to set engine because only C5.0 is used for C5_rules()
  #verify with show_engines("C5_rules")
  set_mode("classification")

#fitting the models
C5_decision_tree_fitted <- C5_decision_tree |> 
  fit(species ~ ., data = penguins)

C5_rules_fitted <- C5_rules |> 
  fit(species ~ ., data = penguins)

#extracting decision tree
C5_decision_tree_fitted |> 
  extract_fit_engine() |> 
  summary()
#> 
#> Call:
#> C5.0.default(x = x, y = y, trials = 1, control = C50::C5.0Control(minCases =
#>  2, sample = 0))
#> 
#> 
#> C5.0 [Release 2.07 GPL Edition]      Mon Jul  4 09:32:16 2022
#> -------------------------------
#> 
#> Class specified by attribute `outcome'
#> 
#> Read 333 cases (7 attributes) from undefined.data
#> 
#> Decision tree:
#> 
#> flipper_length_mm > 206:
#> :...island = Biscoe: Gentoo (118)
#> :   island in {Dream,Torgersen}:
#> :   :...bill_length_mm <= 46.5: Adelie (2)
#> :       bill_length_mm > 46.5: Chinstrap (5)
#> flipper_length_mm <= 206:
#> :...bill_length_mm > 43.3:
#>     :...island in {Biscoe,Torgersen}: Adelie (4/1)
#>     :   island = Dream: Chinstrap (59/1)
#>     bill_length_mm <= 43.3:
#>     :...bill_length_mm <= 42.3: Adelie (134/1)
#>         bill_length_mm > 42.3:
#>         :...sex = female: Chinstrap (4)
#>             sex = male: Adelie (7)
#> 
#> 
#> Evaluation on training data (333 cases):
#> 
#>      Decision Tree   
#>    ----------------  
#>    Size      Errors  
#> 
#>       8    3( 0.9%)   <<
#> 
#> 
#>     (a)   (b)   (c)    <-classified as
#>    ----  ----  ----
#>     145     1          (a): class Adelie
#>       1    67          (b): class Chinstrap
#>       1         118    (c): class Gentoo
#> 
#> 
#>  Attribute usage:
#> 
#>  100.00% flipper_length_mm
#>   64.56% bill_length_mm
#>   56.46% island
#>    3.30% sex
#> 
#> 
#> Time: 0.0 secs

#extracting rules
C5_rules_fitted |> 
  extract_fit_engine() |> 
  summary()
#> 
#> Call:
#> C5.0.default(x = x, y = y, trials = trials, rules = TRUE, control
#>  = C50::C5.0Control(minCases = minCases, seed = sample.int(10^5,
#>  1), earlyStopping = FALSE))
#> 
#> 
#> C5.0 [Release 2.07 GPL Edition]      Mon Jul  4 09:32:16 2022
#> -------------------------------
#> 
#> Class specified by attribute `outcome'
#> 
#> Read 333 cases (7 attributes) from undefined.data
#> 
#> Rules:
#> 
#> Rule 1: (68, lift 2.2)
#>  bill_length_mm <= 43.3
#>  sex = male
#>  ->  class Adelie  [0.986]
#> 
#> Rule 2: (208/64, lift 1.6)
#>  flipper_length_mm <= 206
#>  ->  class Adelie  [0.690]
#> 
#> Rule 3: (48, lift 4.8)
#>  island = Dream
#>  bill_length_mm > 46.5
#>  ->  class Chinstrap  [0.980]
#> 
#> Rule 4: (34/1, lift 4.6)
#>  bill_length_mm > 42.3
#>  flipper_length_mm <= 206
#>  sex = female
#>  ->  class Chinstrap  [0.944]
#> 
#> Rule 5: (118, lift 2.8)
#>  island = Biscoe
#>  flipper_length_mm > 206
#>  ->  class Gentoo  [0.992]
#> 
#> Default class: Adelie
#> 
#> 
#> Evaluation on training data (333 cases):
#> 
#>          Rules     
#>    ----------------
#>      No      Errors
#> 
#>       5    2( 0.6%)   <<
#> 
#> 
#>     (a)   (b)   (c)    <-classified as
#>    ----  ----  ----
#>     146                (a): class Adelie
#>       1    67          (b): class Chinstrap
#>             1   118    (c): class Gentoo
#> 
#> 
#>  Attribute usage:
#> 
#>   97.90% flipper_length_mm
#>   49.85% island
#>   40.84% bill_length_mm
#>   30.63% sex
#> 
#> 
#> Time: 0.0 secs

Created on 2022-07-04 by the reprex package (v2.0.1)

Upvotes: 4

Related Questions