user2947767
user2947767

Reputation: 1311

How can I get the OOB samples used for each tree in random forest model R?

Is it possible to get the OOB samples used by random forest algorithm for each tree ? I'm using R language. I know that RandomForest algorithm uses almost 66% of the data (selected randomly) to grow up each tree, and 34 % of the data as OOB samples to measure the OOB error, but I don't know how to get those OOB samples for each tree ?

Any idea ?

Upvotes: 2

Views: 1509

Answers (1)

jmuhlenkamp
jmuhlenkamp

Reputation: 2150

Assuming you are using the randomForest package, you just need to set the keep.inbag argument to TRUE.

library(randomForest)
set.seed(1)
rf <- randomForest(Species ~ ., iris, keep.inbag = TRUE)

The output list will contain an n by ntree matrix that can be accessed by the name inbag.

dim(rf$inbag)
# [1] 150 500

rf$inbag[1:5, 1:3]
#   [,1] [,2] [,3]
# 1    0    1    0
# 2    1    1    0
# 3    1    0    1
# 4    1    0    1
# 5    0    0    2

The values in the matrix tell you how many times a sample was in-bag. For example, the value of 2 in row 5 column 3 above says that the 5th observation was included in-bag twice for the 3rd tree.

As a bit of background here, a sample can show up in-bag more than once (hence the 2) because by default the sampling is done with replacement.

You can also sample without replacement via the replace parameter.

set.seed(1)
rf2 <- randomForest(Species ~ ., iris, keep.inbag = TRUE, replace = FALSE)

And now we can verify that without replacement, the maximum number of times any sample is included is once.

# with replacement, the maximum number of times a sample is included in a tree is 7
max(rf$inbag)
# [1] 7

# without replacemnet, the maximum number of times a sample is included in a tree is 1
max(rf2$inbag)
# [1] 1

Upvotes: 5

Related Questions