user1369206
user1369206

Reputation: 1

text rpart decision tree model -- how to suppress long list of values at each split node

I create a decision tree model with all categorical variables. Some of these categorical variables has over 100 possible values.

Here is my code:

model = rpart(score ~., data = dataset);
plot(model)
text(model)

The problem is that text(model) annotates each split node with a long list of values for the corresponding categorical variable. And the values are squeezed into each other and hard to look at. I am looking for an option for text(model) to display only the variable name and suppress all the values. That way at least the plotted tree is clear and shows which variable are used at each node.

Thanks in advance!

Leo

Upvotes: 0

Views: 3533

Answers (1)

Adam Kimberley
Adam Kimberley

Reputation: 899

The prp function in rpart.plot might help?

There are a number of options for plotting different tree layouts and you can abbreviate the split levels using the faclen command.

Something like;

library(rpart.plot)
model = rpart(score ~., data = dataset)

prp(model, faclen = 2)

Might help tidy it up. (Note: Setting faclen to 1 means each factor level will be assigned a single letter in alphabetical order).

Upvotes: 2

Related Questions