Reputation: 93
I'm attempting to combine multiple random forest in R, using the randomForest 'combine' function, but cannot do so with the randomForest output from the 'caret' package wrapper.
The object returned has the class 'train', not 'randomForest' - Any ideas please?
I am unclear how to retrieve the randomForest objects after running caret's 'train' function, which I believe should contain them.
The reason for this is that I'm running analyses on a large data set, too big to run randomForest on with my hardware.
To manage the dataset with the available memory I've produce many smaller forests first and then combined them using the rf 'combine' function. The results are good, and I want to do the same with the outputs from caret.
An outline of the problem code (I would rather use an apply function than a loop, but I'm also unclear on application to this example)
trainData.Slices <- list() #My data is 'sliced' into manageable pieces, each one being run through randomForest individually before being recombined
trainData.Slices[[1]] <-data.frame("y.val" = runif(1000, 0, 1), pred1 = runif(1000, 1, 5), pred1 = runif(1000, 10, 20))
trainData.Slices[[2]] <- data.frame("y.val" = runif(1000, 0, 1), pred1 = runif(1000, 1, 5), pred1 = runif(1000, 10, 20))
trainData.Slices[[3]] <- data.frame("y.val" = runif(1000, 0, 1), pred1 = runif(1000, 1, 5), pred1 = runif(1000, 10, 20))
slicesRun <- length(trainData.Slices) #Specify how many slices to cut the data into for individual processing
forestList <- list() #The list into which each small forest will be added
nVar <- length(trainData.Slices[[1]])
for (i in 1:slicesRun) {
trainData <- trainData.Slices[[i]]
#The standard randomForest code works perfectly
forestList[[i]] <- randomForest(x=trainData[,-1], y=trainData[,1],ntree=200, importance=TRUE, proximity=TRUE)
print(class(forestList[[i]]))
#caret is returning 'train' objects rather than randomForest objects
forestList_caret[[i]] <- train(y=trainData[,1], x=trainData[,-1], method="rf", trControl=trainControl(method="cv", number=5), prox=TRUE, allowParallel=TRUE)
print(class(forestList_caret[[i]]))
#How can the rf objects be returned instead, or train objects combined?
}
rf.all <- do.call("combine",forestList) #Combine the forests into one
rf.all_caret <- do.call("combine",forestList) #Combine the forests into one
Upvotes: 2
Views: 710
Reputation: 310
I also had this issue and found the following from this post: Error when using predict() on a randomForest object trained with caret's train() using formula
The randomForest
object is in $finalModel
, so forestList_caret[[i]]$finalModel
in your example. Your code works with the following changes:
line 8 to forestList <- forestList_caret <- list()
line 28 to rf.all_caret <- do.call("combine",forestList_caret)
Insert after line 22:
forestList_caret[[i]] <- forestList_caret[[i]]$finalModel
print(class(forestList_caret[[i]]))
Storing the $finalModel
object lets you can combine them at the end, and the result is an object with class randomForest
. Check with:
print(class(rf.all_caret))
Upvotes: 1