Jon Claus
Jon Claus

Reputation: 2932

R - Decreasing memory usage of using caret to train a random forest

I am trying to create a random forest given ~100 thousand inputs. To accomplish them, I am using train from the caret package with method = "parRF". Unfortunately, my machine with 128 GBs of memory still runs out. Therefore, I need to cut down on how much memory I use.

Right now, the training method I am running is:

> trControl <- trainControl(method = "LGOCV", p = 0.9, savePredictions = T)
> model_parrf <- train(x = data_preds, y = data_resp, method = "parRF",
                     trControl = trControl)

However, because each forest is kept, the system quickly runs out of memory. If my understanding of train and randomForest is correct, each random forest made stores about 500 * 100,000 doubles at the very least. Therefore, I would like to throw away the random forests I no longer need. I tried passing the keep.forest = FALSE into randomForest using

> model_parrf <- train(x = data_preds, y = data_resp, method = "parRF",
                       trControl = trControl, keep.forest = FALSE)
Error in train.default(x = data_preds, y = data_resp, method = "parRF",  : 
  final tuning parameters could not be determined

In addition, this warning was thrown repeatedly:

In eval(expr, envir, enclos) :
  predictions failed for Resample01: mtry=2 Error in predict.randomForest(modelFit, newdata) : 
  No forest component in the object

It seems that for some reason, caret requires the forests to be kept in order to compare models. Is there any way I can use caret with less memory?

Upvotes: 2

Views: 1317

Answers (1)

topepo
topepo

Reputation: 14316

Keep in mind that, if you use M cores, you need to store the data set M+1 times. Try using less workers.

Upvotes: 1

Related Questions