Reputation: 159
I am working on a project to determine the variables that better predict the binary outcome. I am first fitting random forest and then calculating conditional variable importance to assess the importance of variables for my subgroup analysis. Training the random forest takes few minutes in R package party while calculating conditional variable importance takes hours if not days for a larger datasets.
To calculate conditional variable importance I used either
Question: since the realization in R is so slow, is there any packages in say python which is faster than R? Or it is in the nature of this algorithm that it couldn't be implemented any faster?
UPD I believe that conditional option is the most time consuming, i.e conditional = TRUE
is much slower than conditional = FALSE
in the code below
# using cforest from the party package:
library(party)
# Fit the model
cf <- cforest(Species ~ ., data = iris,
controls=cforest_unbiased(ntree=500, mtry=3))
# Get variable importance
varimp(cf, conditional = TRUE, nperm = 10)
Upvotes: 0
Views: 151
Reputation: 1793
How long the analysis takes will vary with the size of the data, the number of predictors, the number of trees etc. But also, some R packages can be much more efficient than others. I would recommend trying some alternative random forest functions first, such as ranger
, cforest
or Rborist
. Here are some simple examples:
# using the ranger package:
library(ranger)
# Fit the model
rf <- ranger(Species ~ ., data = iris, importance = 'permutation')
# Get variable importance
rf$variable.importance
# using cforest from the party package:
library(party)
# Fit the model
cf <- cforest(Species ~ ., data = iris, controls=cforest_unbiased(ntree=500, mtry=3))
# Get variable importance
varimp(cf)
Upvotes: 0