Reputation: 1
I create a decision tree model with all categorical variables. Some of these categorical variables has over 100 possible values.
Here is my code:
model = rpart(score ~., data = dataset);
plot(model)
text(model)
The problem is that text(model)
annotates each split node with a long list of values for the corresponding categorical variable. And the values are squeezed into each other and hard to look at. I am looking for an option for text(model)
to display only the variable name and suppress all the values. That way at least the plotted tree is clear and shows which variable are used at each node.
Thanks in advance!
Leo
Upvotes: 0
Views: 3533
Reputation: 899
The prp
function in rpart.plot might help?
There are a number of options for plotting different tree layouts and you can abbreviate the split levels using the faclen
command.
Something like;
library(rpart.plot)
model = rpart(score ~., data = dataset)
prp(model, faclen = 2)
Might help tidy it up. (Note: Setting faclen to 1 means each factor level will be assigned a single letter in alphabetical order).
Upvotes: 2