How can I examine the split values in my randomForest in r? (interpreting forest$xbestsplit?)

Question

I am building a random forest in r using randomForest. All my features are categorical. For example, my feature "voting method in 2020 general election" has responses {"", "AB", "AP", "MB", "P"}. I would like to know if my trees are generally splitting between the empty string and the other responses (indicating that vote method is less important than whether a vote was recorded.)

I have been examining forest$xbestsplit which seems to contain what I need, but I'm not sure how to interpret it. Calling forest$xbestsplit on one forest produces this output . Basically, it gives me a column for each of my 500 trees. Each column some number of rows. I'm not sure if the rows represent nodes and how to interpret the given numeric values, since my responses are categorical.

I built a forest with just one feature as an example:

mini_data <- data.frame(vote_method = c('MB', 'MB', '', 'MB', 'MB', 'MB', 'AP', '', 'AP', 'MB'), 
     target = c(1, 1, 0, 1, 1, 0, 0, 0, 1, 1))

mini_data$vote_method <- as.factor(mini_data$vote_method)
mini_data$target <- as.factor(mini_data$target)

forest <- randomForest(target ~ ., data=mini_data)

forest$forest$xbestsplit[,1]

I think this should be the split values for all the nodes in the first tree. The output was: 1 0 0 0 0

The randomForest documentation has this note on split values for categorical variables (for the function getTree):

For categorical predictors, the splitting point is represented by an integer, whose binary expansion gives the identities of the categories that goes to left or right. For example, if a predictor has four categories, and the split point is 13. The binary expansion of 13 is (1, 0, 1, 1) (because 13 = 1 ∗ 2^0 + 0 ∗ 2^1 + 1 ∗ 2^2 + 1 ∗ 2^3), so cases with categories 1, 3, or 4 in this predictor get sent to the left, and the rest to the right.

Under this explanation, what do 0 and 1 mean?

How can I examine the split values in my randomForest in r? (interpreting forest$xbestsplit?)

Answers (1)

Related Questions