user149635
user149635

Reputation: 65

Display more nodes in decision tree in R?

Base on the result I have 7 nodes, I wanted to have more than 2 nodes displayed in the result, but existing it seemed that I keep on having 2 nodes displayed.

Is there a way to display more nodes and in a nicer way?

 library(rpart)

 tr1<-rpart(leaveyrx~marstx.f+age+jobtitlex.f+organizationunitx.f+fteworkschedule+nationalityx.f+eesubgroupx.f+lvlx.f+sttpmx.f+ staff2ndtpmx.f+staff3rdtpmx.f+staff4thtpmx.f, method="class",data=btree) 

 printcp(tr1)

 plotcp(tr1) 

 summary(tr1)

 plot(tr1, uniform=TRUE, margin = 0.2, main="Classification Tree for Exploration") text(tr1, use.n=TRUE, all=TRUE, cex=.5)

*A repost

Upvotes: 0

Views: 3988

Answers (1)

Jeff Parker
Jeff Parker

Reputation: 1969

Your problem probably is not your plot, but rather your decision tree model. Can you clarify why you expect 7 nodes? When you only have two (leaf) nodes, it probably means that your model is only using one predictor variable and using a binary classification as the response variable. This is probably caused by the predictor variable having a 1:1 relation with the response variable. For example, if you are predicting Gender (Male, Female) and one of your response variables is Sex (M,F). In this case, a decision tree model is not needed because you can just use the predictor variable. Maybe something happened in the pre-processing of your data that copied the response variable. Here are a few things to look for:

1) Calculate the Correct Classification Rate (CCR). If it is 0, then you have a perfect model.

yhat<-predict(tr1, type="class") # Model Predictions
sum(yhat != btree$leaveyrx)/nrow(btree) # CCR

2) See which predictor your model is using. Double check that this variable has been processed correctly. Try excluding it from the model.

tr1$variable.importance

3) If you are absolutely sure the variable is calculated correctly and that it should be used in the model, try increasing your cp value. The default is 0.01. But decision trees will run quickly even with high cp values. While you are tinkering with the cp values, also consider the other tuning parameters. ?rpart.control

control <- rpart.control(minbucket = 20, cp = 0.0002, maxsurrogate = 0, usesurrogate = 0, xval = 10)
tr1 <- rpart(leaveyrx~marstx.f+age+jobtitlex.f+organizationunitx.f+fteworkschedule+nationalityx.f+eesubgroupx.f+lvlx.f+sttpmx.f+ staff2ndtpmx.f+staff3rdtpmx.f+staff4thtpmx.f,
             data=btree,
             method = "class",
             control = control)

4) Once you have a tree with many nodes, you will need to trim it back. It may that your best model is really only driven off of one variable and hence will only have two nodes

# Plot the cp
plotcp(tr1)
printcp(tr1) # Printing cp table (choose the cp with the smallest xerror)

# Prune back to optimal size, according to plot of CV r^2
tr1.pruned <- prune(tr1, cp=0.001)  #approximately the cp corresponding to the best size

5) the rpart libary is a good resource for plotting the decision trees. There are are lots of great articles out there, but here is one a good one on rpart: http://www.milbo.org/rpart-plot/prp.pdf

It may also be helpful to post a bit of the summary of your model.

Upvotes: 1

Related Questions