Reputation: 604
What am I doing wrong here? What does "subscript out of bound" mean?
I got the below code (first block) excerpt form a Revolution R online seminar regarding datamining in R. I'm trying to incorporate this in a RF model I ran but can't get pass what I think is the ordering of variables. I just want to plot the importance of the variables.
I included a little more then needed below to give context. But really what I am erroring out is the third line of code. The second code block are the errors I am getting as applied to the data I am working with. Can anyone help me figure this out?
-------------------------------------------------------------------------
# List the importance of the variables.
rn <- round(importance(model.rf), 2)
rn[order(rn[,3], decreasing=TRUE),]
##@# of
# Plot variable importance
varImpPlot(model.rf, main="",col="dark blue")
title(main="Variable Importance Random Forest weather.csv",
sub=paste(format(Sys.time(), "%Y-%b-%d %H:%M:%S"), Sys.info()["user"]))
#--------------------------------------------------------------------------
My errors:
> rn[order(rn[,2], decreasing=TRUE),]
Error in order(rn[, 2], decreasing = TRUE) : subscript out of bounds
Upvotes: 4
Views: 4530
Reputation: 1383
Think I understand the confusion. I bet you a 4-finger Kit Kat that if you type in ncol(rn)
you'll see that rn has 2 columns, not 3 as you might expect. The first "column" you're seeing on the screen isn't really a column - it's just the row names for the object rn. Type rownames(rn)
to confirm this. The final column of rn that you want to order by is therefore rn[,2] rather than rn[,3]. The "subscript out of bounds" message comes up because you've asked R to order by column 3, but rn doesn't have a column 3.
Here's my brief detective trail for anyone interested in what the "importance" object actually is... I installed library(randomForest) and then ran an example from the documentation online:
set.seed(4543)
data(mtcars)
mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000,
keep.forest=FALSE, importance=TRUE)
importance(mtcars.rf)
Turns out the "importance" object in this case looks like this (first few rows only to save space):
%IncMSE IncNodePurity
cyl 17.058932 181.70840
disp 19.203139 242.86776
hp 17.708221 191.15919
...
Obviously ncol(importance(mtcars.rf)) is 2, and the row names are likely to be the thing leading to confusion :)
Upvotes: 6