R - Interpreting Random Forest Importance

Question

I'm working with random forest models in R as a part of an independent research project. I have fit my random forest model and generated the overall importance of each predictor to the models accuracy. However, in order to interpret my results in a research paper, I need to understand whether the variables have a positive or negative impact on the response variable.

Is there a way to produce this information from a random forest model? I.e. I expect age to have a positive impact on the likelihood a surgical complication occurs, but existence of osteoarthritis not so much.

Code:

surgery.bagComp = randomForest(complication~ahrq_ccs+age+asa_status+bmi+baseline_cancer+baseline_cvd+baseline_dementia+baseline_diabetes+baseline_digestive+baseline_osteoart+baseline_psych+baseline_pulmonary,data=surgery,mtry=2,importance=T,cutoff=c(0.90,0.10)) #The cutoff is the probability for each group selection, probs of 10% or higher are classified as 'Complication' occurring

surgery.bagComp #Get stats for random forest model

imp=as.data.frame(importance(surgery.bagComp)) #Analyze the importance of each variable in the model
imp = cbind(vars=rownames(imp), imp)
imp = imp[order(imp$MeanDecreaseAccuracy),]
imp$vars = factor(imp$vars, levels=imp$vars)
dotchart(imp$MeanDecreaseAccuracy, imp$vars, 
         xlim=c(0,max(imp$MeanDecreaseAccuracy)), pch=16,xlab = "Mean Decrease Accuracy",main = "Complications - Variable Importance Plot",color="black")

Importance Plot:

Any suggestions/areas of research anyone can suggest would be greatly appreciated.

user1808924 · Accepted Answer

In order to interpret my results in a research paper, I need to understand whether the variables have a positive or negative impact on the response variable.

You need to be perform "feature impact" analysis, not "feature importance" analysis.

Algorithmically, it's about traversing decision tree data structures and observing what was the impact of each split on the prediction outcome. For example, consider the split "age <= 40". Does the left branch (condition evaluates to true) carry lower likelihood than the right branch (condition evaluates to false)?

Feature importances may give you a hint which features to look for, but it cannot be "transformed" to feature impacts.

You might find the following articles helpful: WHY did your model predict THAT? (Part 1 of 2) and WHY did your model predict THAT? (Part 2 of 2).

R - Interpreting Random Forest Importance

Answers (1)

Related Questions