John
John

Reputation: 351

How do I make a randomForest model size smaller?

I've been training randomForest models in R on 7 million rows of data (41 features). Here's an example call:

myModel <- randomForest(RESPONSE~., data=mydata, ntree=50, maxnodes=30)

I thought surely with only 50 trees and 30 terminal nodes that the memory footprint of "myModel" would be small. But it's 65 megs in a dump file. The object seems to be holding all sorts of predicted, actual, and vote data from the training process.

What if I just want the forest and that's it? I want a tiny dump file that I can load later to make predictions off of quickly. I feel like the forest by itself shouldn't be all that large...

Anyone know how to strip this sucker down to just something I can make predictions off of going forward?

Upvotes: 5

Views: 3216

Answers (2)

Satish Chilloji
Satish Chilloji

Reputation: 56

You can make use of tuneRF function in R to know the number of trees and make the size smaller.

tuneRF(data_train, data_train$Response, stepFactor = 1.2, improve = 0.01, plot = T, trace = T)

use ?tuneRF to know more about inside variables.

Upvotes: 1

Joshua Ulrich
Joshua Ulrich

Reputation: 176648

Trying to get out of the habit of posting answers as comments...

?randomForest advises against using the formula interface with large numbers of variables... are the results any different if you don't use the formula interface? The Value section of ?randomForest also tells you how to turn off some of the output (importance matrix, the entire forest, proximity matrix, etc.).

For example:

myModel <- randomForest(mydata[,!grepl("RESPONSE",names(mydata))],
  mydata$RESPONSE, ntree=50, maxnodes=30, importance=FALSE,
  localImp=FALSE, keep.forest=FALSE, proximity=FALSE, keep.inbag=FALSE)

Upvotes: 1

Related Questions