Conditional Variable Importance for Random Forests faster than in R?

Question

I am working on a project to determine the variables that better predict the binary outcome. I am first fitting random forest and then calculating conditional variable importance to assess the importance of variables for my subgroup analysis. Training the random forest takes few minutes in R package party while calculating conditional variable importance takes hours if not days for a larger datasets.

To calculate conditional variable importance I used either

party::varimp in R based on the paper https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-307 or
permimp::permimp in R based on the later paper from the same authors https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03622-2

Question: since the realization in R is so slow, is there any packages in say python which is faster than R? Or it is in the nature of this algorithm that it couldn't be implemented any faster?

UPD I believe that conditional option is the most time consuming, i.e conditional = TRUE is much slower than conditional = FALSE in the code below

# using cforest from the party package: 
library(party)  
# Fit the model 
cf <- cforest(Species ~ ., data = iris, 
              controls=cforest_unbiased(ntree=500, mtry=3))  
# Get variable importance 
varimp(cf, conditional = TRUE, nperm = 10)

Conditional Variable Importance for Random Forests faster than in R?

Answers (1)

Related Questions