daniellopez46
daniellopez46

Reputation: 604

in R Plot importance variables of Random Forest model

What am I doing wrong here? What does "subscript out of bound" mean?

I got the below code (first block) excerpt form a Revolution R online seminar regarding datamining in R. I'm trying to incorporate this in a RF model I ran but can't get pass what I think is the ordering of variables. I just want to plot the importance of the variables.

I included a little more then needed below to give context. But really what I am erroring out is the third line of code. The second code block are the errors I am getting as applied to the data I am working with. Can anyone help me figure this out?

    -------------------------------------------------------------------------
# List the importance of the variables.
rn <- round(importance(model.rf), 2)
rn[order(rn[,3], decreasing=TRUE),]
##@# of 
# Plot variable importance
varImpPlot(model.rf, main="",col="dark blue")
title(main="Variable Importance Random Forest weather.csv",
            sub=paste(format(Sys.time(), "%Y-%b-%d %H:%M:%S"), Sys.info()["user"])) 
#--------------------------------------------------------------------------

My errors:

> rn[order(rn[,2], decreasing=TRUE),]
Error in order(rn[, 2], decreasing = TRUE) : subscript out of bounds

Upvotes: 4

Views: 4530

Answers (1)

Tim P
Tim P

Reputation: 1383

Think I understand the confusion. I bet you a 4-finger Kit Kat that if you type in ncol(rn) you'll see that rn has 2 columns, not 3 as you might expect. The first "column" you're seeing on the screen isn't really a column - it's just the row names for the object rn. Type rownames(rn) to confirm this. The final column of rn that you want to order by is therefore rn[,2] rather than rn[,3]. The "subscript out of bounds" message comes up because you've asked R to order by column 3, but rn doesn't have a column 3.

Here's my brief detective trail for anyone interested in what the "importance" object actually is... I installed library(randomForest) and then ran an example from the documentation online:

set.seed(4543)
data(mtcars)
mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000, 
             keep.forest=FALSE, importance=TRUE)
importance(mtcars.rf)

Turns out the "importance" object in this case looks like this (first few rows only to save space):

       %IncMSE IncNodePurity
cyl  17.058932     181.70840
disp 19.203139     242.86776
hp   17.708221     191.15919
...

Obviously ncol(importance(mtcars.rf)) is 2, and the row names are likely to be the thing leading to confusion :)

Upvotes: 6

Related Questions