Reputation: 10360
I am trying to predict a randomForest Object to a huge Raster Layer (34 mio cells, 120+ layers). Therefore, I use the clusterR
function within the raster
package. However, if I start to predict
the previously calculated randomForest
object, it is loaded into all parallel workers. Thus, the all the processes combined need a lot of memory.
Is it possible to reduce the size of a randomForest
object, without loosing the model? Does anyone have experience with this?
I create the model like this:
library(randomForest)
set.seed(42)
df <- data.frame(class = sample(x = 1:3, size = 10000, replace = T))
str(df)
for (i in 1:100){
df <- cbind(df, runif(10000))
}
colnames(df) <- c("class", 1:100)
df$class <- as.factor(df$class)
rfo <- randomForest(x = df[,2:ncol(df)],
y = df$class,
ntree = 500,
do.trace = 10)
object.size(rfo)
# 57110816 bytes
Upvotes: 1
Views: 706