Customizing Importance Plot - R

Question

I'm working on a write up of a results analysis and wanted to change my MeanDecreaseAccuracy graph from the variable importance plot after performing Random Forests. I want to only take the MeanDecreaseAccuracy graph and turn it into a barplot to make for a nicer visualization than what is currently displayed.

What is the best way to do this?

My current code (There is a lot going on before this, but for purposes of this example this should be sufficient):

wine=read.csv("wine_dataset.csv")
wine$quality01[wine$quality >= 7] <- 1
wine$quality01[wine$quality < 7] <- 0
wine$quality01=as.factor(wine$quality01)
summary(wine)
num_data <- wine[,sapply(wine,is.numeric)]
hist.data.frame(num_data)

set.seed(8, sample.kind = "Rounding") #Set Seed to make sure results are repeatable
wine.bag=randomForest(quality01 ~ alcohol + volatile_acidity + sulphates + residual_sugar + 
    chlorides + free_sulfur_dioxide + fixed_acidity + pH + density + 
    citric_acid,data=wine,mtry=3,importance=T)    #Use Random Forest with a mtry value of 3 to fit the model

wine.bag #Review the Random Forest Results
plot(wine.bag) #Plot the Random Forest Results
imp=as.data.frame(importance(wine.bag)) #Analyze the importance of each variable in the model
imp=cbind(vars=rownames(imp),imp)
barplot(imp$MeanDecreaseAccuracy, names.arg=imp$vars)

My output is currently:

Current Code I am using can be found here:

eipi10 · Accepted Answer

Here are a couple of options:

library(randomForest)
library(tidyverse)

# Random forest model
iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE)

# Get importance values as a data frame
imp = as.data.frame(importance(iris.rf))
imp = cbind(vars=rownames(imp), imp)
imp = imp[order(imp$MeanDecreaseAccuracy),]
imp$vars = factor(imp$vars, levels=unique(imp$vars))

barplot(imp$MeanDecreaseAccuracy, names.arg=imp$vars)

imp %>% 
  pivot_longer(cols=matches("Mean")) %>% 
  ggplot(aes(value, vars)) +
  geom_col() +
  geom_text(aes(label=round(value), x=0.5*value), size=3, colour="white") +
  facet_grid(. ~ name, scales="free_x") +
  scale_x_continuous(expand=expansion(c(0,0.04))) +
  theme_bw() +
  theme(panel.grid.minor=element_blank(),
        panel.grid.major=element_blank(),
        axis.title=element_blank())

I also wouldn't give up on the dotchart, which (IMHO) is a cleaner visualization. Here are options that are more customized than the built-in output in your question:

dotchart(imp$MeanDecreaseAccuracy, imp$vars, 
         xlim=c(0,max(imp$MeanDecreaseAccuracy)), pch=16)

imp %>% 
  pivot_longer(cols=matches("Mean")) %>% 
  ggplot(aes(value, vars)) +
  geom_point() +
  facet_grid(. ~ name) +
  scale_x_continuous(limits=c(0,NA), expand=expansion(c(0,0.04))) +
  theme_bw() +
  theme(panel.grid.minor=element_blank(),
        panel.grid.major.x=element_blank(),
        panel.grid.major.y=element_line(),
        axis.title=element_blank())

You could also plot the values themselves instead of point markers. For example:

imp %>% 
  pivot_longer(cols=matches("Mean")) %>% 
  ggplot(aes(value, vars)) +
  geom_text(aes(label=round(value,1)), size=3) +
  facet_grid(. ~ name, scales="free_x") +
  scale_x_continuous(limits=c(0,NA), expand=expansion(c(0,0.06))) +
  theme_bw() +
  theme(panel.grid.minor=element_blank(),
        panel.grid.major.x=element_blank(),
        panel.grid.major.y=element_line(),
        axis.title=element_blank())

Customizing Importance Plot - R

Answers (2)

Related Questions