Reputation: 21
Could somebody show me how to generate permutation-based variable implots within the tidy modelling framework? Currently, I have this:
library(tidymodels)
# variable importance
final_fit_train %>%
pull_workflow_fit() %>%
vip(geom = "point",
aesthetics = list(color = cbPalette[4],
fill = cbPalette[4])) +
THEME +
ggtitle("Elastic Net")
which generates this:
However, I would like to have something like this
It's not clear to me how the rather new tidy modelling framework integrates with the current VIP package. Anybody that could help. Thanks!
https://koalaverse.github.io/vip/articles/vip.html (API of the VIP package).
Upvotes: 2
Views: 2417
Reputation: 11613
To compute variable importance using permutation, you need just a few more pieces to put together, compared to using model-dependent variable importance.
Let's look at an example for an SVM model, which does not have model-dependent variable importance score.
library(tidymodels)
#> ── Attaching packages ──────────────────────── tidymodels 0.1.1 ──
#> ✓ broom 0.7.0 ✓ recipes 0.1.13
#> ✓ dials 0.0.8 ✓ rsample 0.0.7
#> ✓ dplyr 1.0.0 ✓ tibble 3.0.3
#> ✓ ggplot2 3.3.2 ✓ tidyr 1.1.0
#> ✓ infer 0.5.3 ✓ tune 0.1.1
#> ✓ modeldata 0.0.2 ✓ workflows 0.1.2
#> ✓ parsnip 0.1.2 ✓ yardstick 0.0.7
#> ✓ purrr 0.3.4
#> ── Conflicts ─────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
#> x recipes::step() masks stats::step()
data("hpc_data")
svm_spec <- svm_poly(degree = 1, cost = 1/4) %>%
set_engine("kernlab") %>%
set_mode("regression")
svm_fit <- workflow() %>%
add_model(svm_spec) %>%
add_formula(compounds ~ .) %>%
fit(hpc_data)
svm_fit
#> ══ Workflow [trained] ════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: svm_poly()
#>
#> ── Preprocessor ──────────────────────────────────────────────────
#> compounds ~ .
#>
#> ── Model ─────────────────────────────────────────────────────────
#> Support Vector Machine object of class "ksvm"
#>
#> SV type: eps-svr (regression)
#> parameter : epsilon = 0.1 cost C = 0.25
#>
#> Polynomial kernel function.
#> Hyperparameters : degree = 1 scale = 1 offset = 1
#>
#> Number of Support Vectors : 2827
#>
#> Objective Function Value : -284.7255
#> Training error : 0.835421
Our model is now trained, so it's ready for computing variable importance. Notice a couple of steps:
pull()
the fitted model object out of the workflow.compounds
.predict()
).library(vip)
#>
#> Attaching package: 'vip'
#> The following object is masked from 'package:utils':
#>
#> vi
svm_fit %>%
pull_workflow_fit() %>%
vip(method = "permute",
target = "compounds", metric = "rsquared",
pred_wrapper = kernlab::predict, train = hpc_data)
Created on 2020-07-17 by the reprex package (v0.3.0)
You can increase nsim
here to do this more than once.
Upvotes: 3