Reputation: 21
I have about 10,000 samples and 9,000 features. I am trying to use RandomForest (RF or GRF) for feature (variable) selection/reduction.
The concept works great when I use 700 features, but for 9,000, when I try to run randomForest or RRF, even with 1 tree (and even with mtry=1), I wait for hours and nothing happens. (FYI, I use sampsize=800)
I was hoping at least to be able to run 1 single tree, and then to use multi computers and to combine.
Any ideas to assist ?
Roni
Upvotes: 2
Views: 819
Reputation: 1234
I have been dealing with the same problem and I solved like below:
This approach may cause loss of some important features but it generally selects the most informative features. By the way, you can change selected feature size (300 in given example) as your needs.
As far as I can find out, there is no other way than brute force to find best feature subset without the probability of losing an important feature.
Upvotes: 2