Reputation: 185
I try to make a decision tree with the following dataset:
RESULT EXPG_HOME R_HOME_3DAY
1 1.321 0.20
2 1.123 0.30
1 0.762 0.26
If I try this:
library(rpart)
tree <- rpart(RESULT ~ EXPG_HOME, df, method="class")
fancyRpartPlot(tree)
It works out. But when I try:
tree <- rpart(RESULT ~ R_HOME_3DAY, df, method="class")
fancyRpartPlot(tree)
I get the following error:
Error in apply(model$frame$yval2[, yval2per], 1, function(x) x[1 + x[1]]) :
dim(X) must have a positive length
Any thoughts on what goes wrong here?
Both EXPG_HOME and R_HOME_3DAY are numeric.
And this is what I get with the relevant variable:
> table(df$R_HOME_3DAY)
0 0.1 0.133333333 0.166666667 0.2 0.233333333
21 65 14 10 194 53
0.266666667 0.3 0.333333333 0.366666667 0.4 0.433333333
63 248 107 185 369 169
0.466666667 0.5 0.533333333 0.566666667 0.6 0.633333333
334 351 184 382 317 213
0.666666667 0.7 0.733333333 0.766666667 0.8 0.833333333
336 251 112 217 92 64
0.866666667 0.9 0.933333333
83 20 5
Upvotes: 2
Views: 7671
Reputation: 2469
What is happening is that the independent variables do not provide enough information to grow your tree. The rpart
package caps the depth that the tree grows by setting default limits. The following is from ?rpart.control
.
rpart.control(minsplit = 20,
minbucket = round(minsplit/3),
cp = 0.01,
maxcompete = 4,
maxsurrogate = 5,
usesurrogate = 2,
xval = 10,
surrogatestyle = 0,
maxdepth = 30, ...)
So, you may want to loosen the control parameters as follows:
tree <- rpart(RESULT ~ EXPG_HOME, df, method="class",
control = rpart.control(minsplit = 1,
minbucket = 1,
cp = 0.001)
This will highly likely result in a tree with many nodes. From here, you can play around with the parameters to get a decent tree.
Upvotes: 2
Reputation: 3728
Problem is you didn't get a tree, just a root (node) :)
> tree <- rpart(RESULT ~ EXPG_HOME, df, method="class")
> fancyRpartPlot(tree)
Error in apply(model$frame$yval2[, yval2per], 1, function(x) x[1 + x[1]]) :
dim(X) must have a positive length
> plot(tree)
Error in plot.rpart(tree) : fit is not a tree, just a root
> tree
n= 3
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 3 1 1 (0.6666667 0.3333333) *
Upvotes: 4