Reputation: 407
I'm using randomForest in order to find out the most significant variables. I was expecting some output that defines the accuracy of the model and also ranks the variables based on their importance. But I am a bit confused now. I tried randomForest and then ran importance()
to extract the importance of variables.
But then I saw another command rfcv
(Random Forest Cross-Valdidation for feature selection), which should be the most appropriate for this purpose I suppose, but the question I have regarding this is: how to get the list of the most important variables? How to see the output after running it? Which command to use?
Another thing: What is the difference between randomForest
and predict.randomForest
?
I am not very familiar with randomforest and R therefore any help would be appreciated.
Thank you in advance!
Upvotes: 1
Views: 6554
Reputation: 867
After you have made a randomForest
model you use predict.randomForest
to use the model you created on new data e.g. build a random forest with training data then run your validation data through that model with predict.randomForest
.
As for the rfcv there is an option recursive
which (from the help):
whether variable importance is (re-)assessed at each step of variable reduction
Its all in the help file
Upvotes: 4