Reputation: 2932
I am trying to create a random forest given ~100 thousand inputs. To accomplish them, I am using train
from the caret package with method = "parRF"
. Unfortunately, my machine with 128 GBs of memory still runs out. Therefore, I need to cut down on how much memory I use.
Right now, the training method I am running is:
> trControl <- trainControl(method = "LGOCV", p = 0.9, savePredictions = T)
> model_parrf <- train(x = data_preds, y = data_resp, method = "parRF",
trControl = trControl)
However, because each forest is kept, the system quickly runs out of memory. If my understanding of train
and randomForest
is correct, each random forest made stores about 500 * 100,000
doubles at the very least. Therefore, I would like to throw away the random forests I no longer need. I tried passing the keep.forest = FALSE
into randomForest
using
> model_parrf <- train(x = data_preds, y = data_resp, method = "parRF",
trControl = trControl, keep.forest = FALSE)
Error in train.default(x = data_preds, y = data_resp, method = "parRF", :
final tuning parameters could not be determined
In addition, this warning was thrown repeatedly:
In eval(expr, envir, enclos) :
predictions failed for Resample01: mtry=2 Error in predict.randomForest(modelFit, newdata) :
No forest component in the object
It seems that for some reason, caret requires the forests to be kept in order to compare models. Is there any way I can use caret with less memory?
Upvotes: 2
Views: 1317
Reputation: 14316
Keep in mind that, if you use M
cores, you need to store the data set M+1
times. Try using less workers.
Upvotes: 1