time for performing predicting model in R

Question

I have a dataset with about 20k rows and 160 column. after some simple preprocess like near-zero variance and removing variables with a high amount of NAs, I kept only 56 column as features. now, I want to perform a training model on this data with random forest method. but after about an hour it didn't answer and I aborted it.

Are there any codes that I can predict the time needed to train the model based on my PC's configuration? Usually, how much does it take to perform a random forest or rpart training method on a dataset with this dimensions?

quickreaction · Accepted Answer

Try setting some parameters for the randomForest function. Start with a small number of trees (ntree) and/or a small number of number of variables drawn at each split (mtry), and/or a small number of "leaves" (maxnodes). Then change the parameters to increase your model complexity and accuracy. This will also help your computer's computational speed as you start small and slowly increase parameters to see their effect on performance.

Note, if you're using randomForest for feature selection (which is why I use it), use a large number of ntree, a low number of mtry, and a low number of maxnodes so you can extract good information about univariates.

time for performing predicting model in R

Answers (2)

Related Questions