spektra
spektra

Reputation: 407

How to do feature selection with randomForest package?

I'm using randomForest in order to find out the most significant variables. I was expecting some output that defines the accuracy of the model and also ranks the variables based on their importance. But I am a bit confused now. I tried randomForest and then ran importance() to extract the importance of variables. But then I saw another command rfcv (Random Forest Cross-Valdidation for feature selection), which should be the most appropriate for this purpose I suppose, but the question I have regarding this is: how to get the list of the most important variables? How to see the output after running it? Which command to use?

Another thing: What is the difference between randomForest and predict.randomForest?

I am not very familiar with randomforest and R therefore any help would be appreciated.

Thank you in advance!

Upvotes: 1

Views: 6554

Answers (1)

dom_oh
dom_oh

Reputation: 867

After you have made a randomForest model you use predict.randomForest to use the model you created on new data e.g. build a random forest with training data then run your validation data through that model with predict.randomForest.

As for the rfcv there is an option recursive which (from the help):

whether variable importance is (re-)assessed at each step of variable reduction

Its all in the help file

Upvotes: 4

Related Questions