Reputation: 3919
I have about 300,000 rows of data and 10 features in my model and I want to fit a random forest from the randomForest
package in R
.
To maximise the amount of trees I can get in the forest in a fixed window of time without ruining generalisation what are sensible ranges that I should set the parameters to?
Upvotes: 0
Views: 299
Reputation: 6522
Usually you can get away with just mtry
as explained here and the default is often best:
https://stats.stackexchange.com/questions/50210/caret-and-randomforest-number-of-trees
But there is a function tuneRF
with randomForest that will help you find optimal ntree
or mtry
as explained here:
setting values for ntree and mtry for random forest regression model
The time it takes you will have to test yourself - it's going to be the products of foldstuningntrees.
The only speculative point I would add is that with 300,000 rows of data you might reduce the runtime without loss of predictive accuracy by bootstrapping small samples of the data???
Upvotes: 2