Reputation: 117
I'm running a random forest model using R
's caret
package, and running varImp
on the returned object gives me the averaged variable importance across the number of bootstrap iterations. However, I would rather assess variable importance for each iteration. Is this possible using the caret
package?
Reproducible example:
library(caret)
mod <- train(Species ~ ., data = iris,
method = "cforest",
controls = cforest_unbiased(ntree = 10))
varImp(mod)
returns:
cforest variable importance
Overall
Petal.Width 100.0000
Petal.Length 86.6279
Sepal.Length 0.5814
Sepal.Width 0.0000
what I'm interested in is rather a list of length=number of bootstrap resamples with variable importance for each iteration. This might be possible using some combination of returnResamp = "all"
and a custom summaryFunction
but I'm not wise enough to know.
Upvotes: 2
Views: 1296
Reputation: 14331
Which bootstrapping iterations do you mean? The ones used internally by cforest
or the resampling done by train
?
train
returns the importance scores produced by the final model object (which may not be the same as "averaged variable importance across the number of bootstrap iterations" , depending on your answer to the first question.)
If you want to get the resampled importance scores over train
's resampling, you can trick rfe
into doing it. For example:
set.seed(1)
mod <- rfe(x = iris[, 1:4], y = iris$Species, sizes = 4,
rfeControl = rfeControl(functions = caretFuncs,
method = "boot",
number = 5),
## pass options to train(),
tuneGrid = data.frame(mtry = 2),
method = "cforest",
controls = cforest_unbiased(ntree = 10))
Then the importance scores for each iteration are in mod$variables
.
Max
Upvotes: 2
Reputation: 206616
After some digging around, i came up with with
getvarimp <- function(x) {
stopifnot(is(x, "train") & is(x$finalModel, "RandomForest"))
vi<-party:::varimp
body(vi)[[length(body(vi))]]<-quote(return(perror))
vi(x$finalModel)
}
getvarimp(mod)
At least for this object type, this seems to be how varImp calucluates it's return value. Specifically, it takes the column means and rescales
vi <- colMeans(getvarimp(mod))
(vi-min(vi)) / max(vi)*100
Note that each time you run this (or varImp
) you may get a slightly different result because it uses some stochastic prediction each time it's run.
There may very well be other ways but I was unable to find any.
Upvotes: 0