Reputation: 7879
I am using tidymodels
to fit multiple Random Forest models. I then followed along with this tutorial to compare the model results. The problem is that I get the error:
Error in
UseMethod("anova") :
no applicable method for 'anova' applied to an object of class "ranger"
As an example:
set.seed(123)
iris <- iris %>% mutate(
is_versicolor = ifelse(Species == "versicolor", "versicolor", "not_versicolor")) %>%
mutate(is_versicolor = factor(is_versicolor, levels = c("versicolor", "not_versicolor")))
iris_split <- initial_split(iris, strata = is_versicolor, prop = 0.8)
iris_train <- training(iris_split)
iris_test <- testing(iris_split)
rec_normal <- recipe(is_versicolor ~ Petal.Width + Species, data = iris_train)
rec_interaction <- rec_normal %>%
step_interact(~ Petal.Width:starts_with("Species"))
iris_model <- rand_forest() %>% set_engine("ranger") %>% set_mode("classification")
# normal workflow
iris_wf <- workflow() %>%
add_model(iris_model) %>%
add_recipe(rec_normal)
# interaction workflow
iris_wf_interaction <- iris_wf %>%
update_recipe(rec_interaction)
# fit models
iris_normal_lf <- last_fit(iris_wf, split = iris_split)
iris_inter_lf <- last_fit(iris_wf_interaction, split = iris_split)
normalmodel <- iris_normal_lf %>% extract_fit_engine()
intermodel <- iris_inter_lf %>% extract_fit_engine()
anova(normalmodel, intermodel) %>% tidy()
How can I run an ANOVA or ANOVA-type comparison of these models, to see if one is significantly better?
Upvotes: 4
Views: 501
Reputation: 2155
Using a different model pair, and comparing models based on classification accuracy using resamples. Easily extended to other metrics.
library(dplyr)
library(tibble)
library(ggplot2)
library(tidyr)
library(rsample)
library(recipes)
library(parsnip)
library(workflows)
library(tune)
library(yardstick)
library(workflowsets)
set.seed(123)
iris <- iris %>% mutate(
is_versicolor = ifelse(Species == "versicolor", "versicolor", "not_versicolor")) %>%
mutate(is_versicolor = factor(is_versicolor, levels = c("versicolor", "not_versicolor")))
iris_split <- initial_split(iris, strata = is_versicolor, prop = 0.8)
iris_train <- training(iris_split)
iris_test <- testing(iris_split)
# replacing normal and interaction recipes with models
# that give less than 100% accuracy.
rec_normal <- recipe(is_versicolor ~ Sepal.Width, data = iris_train)
rec_alternative <- recipe(is_versicolor ~ Sepal.Length, data = iris_train)
iris_model <- rand_forest() %>% set_engine("ranger") %>% set_mode("classification")
# Create folds
set.seed(234)
iris_folds <- vfold_cv(iris_train)
iris_folds
# Combine models into set
iris_set <-
workflow_set(
list(rec_normal, rec_alternative),
list(iris_model),
cross = TRUE
)
doParallel::registerDoParallel()
set.seed(2021)
# fit models
iris_rs <-
workflow_map(
iris_set,
"fit_resamples",
resamples = iris_folds
)
# Visualise model performance
autoplot(iris_rs)
# Extract resample accuracies
model_1_rs <- iris_rs[1,][[4]][[1]]$.metrics
model_2_rs <- iris_rs[2,][[4]][[1]]$.metrics
model_acc <- tibble(model_1 = NA, model_2 = NA)
for (i in 1:10) {
model_acc[i, 1] <- model_1_rs[[i]][[".estimate"]][1]
model_acc[i, 2] <- model_2_rs[[i]][[".estimate"]][1]
}
model_acc <- model_acc |> pivot_longer(cols = everything(), names_to = "model", values_to = "acc")
# Do ANOVA
aov_results <- aov(acc ~ model, data = model_acc)
summary(aov_results)
ggplot(data = model_acc, aes(fill = model)) +
geom_density(aes(x = acc, alpha = 0.2)) +
labs(x = "accuracy")
Giving the p values:
> summary(aov_results)
Df Sum Sq Mean Sq F value Pr(>F)
model 1 0.0281 0.02813 1.378 0.256
Residuals 18 0.3674 0.02041
Looking at the p values of the model accuracies using a different lens: First visualise the variation:
model_acc |> ggplot(aes(x = model, y = acc)) +
geom_boxplot() +
labs(y = 'accuracy')
Then calculate a test statistic:
observed_statistic <- model_acc %>%
specify(acc ~ model) %>%
calculate(stat = "diff in means", order = c("model_1", "model_2"))
observed_statistic
Then do a simulation of the distribution:
null_dist_2_sample <- model_acc %>%
specify(acc ~ model) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "diff in means" ,order = c("model_1", "model_2"))
and plot:
null_dist_2_sample %>%
visualize() +
shade_p_value(observed_statistic,
direction = "two-sided") +
labs(x = "test statistic")
p_value_2_sample <- null_dist_2_sample %>%
get_p_value(obs_stat = observed_statistic,
direction = "two-sided")
p_value_2_sample
# A tibble: 1 × 1
p_value
<dbl>
1 0.228
Which is almost the same as the p value from the aov. Note that consistent with the accuracies of the two models being close, the p value is high.
Upvotes: 2
Reputation: 41337
You could compare your random forest models by comparing their accuracies using the aov
function. First, you can collect the accuracy with collect_metrics
and save them in a data frame to run a model with aov
to get the results. Here is a reproducible example:
library(tidymodels)
set.seed(123)
iris <- iris %>% mutate(
is_versicolor = ifelse(Species == "versicolor", "versicolor", "not_versicolor")) %>%
mutate(is_versicolor = factor(is_versicolor, levels = c("versicolor", "not_versicolor")))
iris_split <- initial_split(iris, strata = is_versicolor, prop = 0.8)
iris_train <- training(iris_split)
iris_test <- testing(iris_split)
rec_normal <- recipe(is_versicolor ~ Petal.Width + Species, data = iris_train)
rec_interaction <- rec_normal %>%
step_interact(~ Petal.Width:starts_with("Species"))
iris_model <- rand_forest() %>% set_engine("ranger") %>% set_mode("classification")
# normal workflow
iris_wf <- workflow() %>%
add_model(iris_model) %>%
add_recipe(rec_normal)
# interaction workflow
iris_wf_interaction <- iris_wf %>%
update_recipe(rec_interaction)
# fit models
iris_normal_lf <- last_fit(iris_wf, split = iris_split)
iris_inter_lf <- last_fit(iris_wf_interaction, split = iris_split)
#> ! train/test split: preprocessor 1/1: Categorical variables used in `step_interact` should probably be avoided...
normalmodel <- iris_normal_lf %>% extract_fit_engine()
intermodel <- iris_inter_lf %>% extract_fit_engine()
# Check confusion matrix
iris_normal_lf %>%
collect_predictions() %>%
conf_mat(is_versicolor, .pred_class)
#> Truth
#> Prediction versicolor not_versicolor
#> versicolor 10 0
#> not_versicolor 0 20
iris_inter_lf %>%
collect_predictions() %>%
conf_mat(is_versicolor, .pred_class)
#> Truth
#> Prediction versicolor not_versicolor
#> versicolor 10 0
#> not_versicolor 0 20
# Extract accuracy of models and create dataset
acc_normalmodel <- iris_normal_lf %>% collect_metrics() %>% select(.estimate) %>% slice(1)
acc_intermodel <- iris_normal_lf %>% collect_metrics() %>% select(.estimate) %>% slice(1)
results = data.frame(model = c("normalmodel", "intermodel"),
accuracy = c(acc_normalmodel$.estimate, acc_intermodel$.estimate))
# perform ANOVA on the classification accuracy
aov_results <- aov(accuracy ~ model, data = results)
summary(aov_results)
#> Df Sum Sq Mean Sq
#> model 1 4.93e-32 4.93e-32
Created on 2022-12-15 with reprex v2.0.2
As you can see the results doesn't show a p-value, because the degree of freedom is to low (why do I not get a p-value from this anova in r)
You could also use the aov
on the predictions of both models and compare these performance. Here is a reproducible example:
# Get predictions of both models for not_versicolor
normalmodel_pred<-as.data.frame(normalmodel$predictions)$not_versicolor
intermodel_pred<-as.data.frame(intermodel$predictions)$not_versicolor
summary(aov(normalmodel_pred~intermodel_pred))
#> Df Sum Sq Mean Sq F value Pr(>F)
#> intermodel_pred 1 25.032 25.032 9392 <2e-16 ***
#> Residuals 118 0.314 0.003
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Created on 2022-12-17 with reprex v2.0.2
As you can see the p-value is less than 0.05 which suggests that there is a difference between the predictions of the models, which is right if you look at the probabilities of the predictions.
More information about ANOVA check this:
Upvotes: 4
Reputation: 2155
Just using your code, and adapting Julia Silge's blog on workflowsets:
Predict #TidyTuesday giant pumpkin weights with workflowsets
As ANOVA is not available for ranger, instead generate folds to resample:
set. Seed(234)
iris_folds <- vfold_cv(iris_train)
iris_folds
Combine your recipes into a workflowset:
iris_set <-
workflow_set(
list(rec_normal, rec_interaction),
list(iris_model),
cross = TRUE
)
iris_set
Setup parallel processing:
doParallel::registerDoParallel()
set. Seed(2021)
Fit using the folds:
iris_rs <-
workflow_map(
iris_set,
"fit_resamples",
resamples = iris_folds
)
autoplot(iris_rs)
This chart would usually address your question of how to compare models.
As "species" is on the righthand side of both recipe formulas, and the response "is_versicolor" is calculated from species, the models are completely accurate.
Finish off the output:
collect_metrics(iris_rs)
final_fit <-
extract_workflow(iris_rs, "recipe_2_rand_forest") %>%
fit(iris_train)
There is no tidier for ranger models.
In your code, if you change to:
rec_normal <- recipe(is_versicolor ~ Sepal.Length + Sepal.Width, data = iris_train)
rec_interaction <- recipe(is_versicolor ~ Petal.Width + Petal.Length, data = iris_train)
you can have some fun!
Hope this helps Adam. Just learning the wonderful Tidymodels like you, and look forward to comments. :-)
Upvotes: 4