goldisfine
goldisfine

Reputation: 4850

Get the most important variable names from varImp()

I am working with the function varImp().

I fit a tree, and then use varImp() to see which variables are most important. I would like to extract the most important variable names from the output of varImp(). But the output appears to be a list and there is no way to get the variable names, only the numerical weights of how important the variables are.

I have tried converting the output to a data frame and also using names() but neither allows me to get the important variable names.

Here's an example:

> # Sample data
> head(Orthodont)
Grouped Data: distance ~ age | Subject
  distance age Subject  Sex
1     26.0   8     M01 Male
2     25.0  10     M01 Male
3     29.0  12     M01 Male
4     31.0  14     M01 Male
5     21.5   8     M02 Male
6     22.5  10     M02 Male
> sample_tree <- rpart(distance ~ ., data = Orthodont)
> varImp(sample_tree)
          Overall
age     1.1178243
Sex     0.5457834
Subject 2.8446154
> names(varImp(sample_tree))
[1] "Overall"
> as.data.frame(varImp(sample_tree))
          Overall
age     1.1178243
Sex     0.5457834
Subject 2.8446154
> # What I want are the names of the two most important variables.

Upvotes: 1

Views: 5377

Answers (1)

meuleman
meuleman

Reputation: 378

The names you're looking for are in the rownames() of the object.

imp <- varImp(sample_tree)
rownames(imp)[order(imp$Overall, decreasing=TRUE)]

Output:

[1] "Sex"     "age"     "Subject"

So the two most important variables, according to these scores, are:

rownames(imp)[order(imp$Overall, decreasing=TRUE)[1:2]]

Which gives:

[1] "Sex"     "age"

Upvotes: 3

Related Questions