Reputation: 211
I'm trying to replicate the procedure proposed here on my data but I get the following error:
Error in interval.numeric(x, breaks = c(xmin - tol, ux, xmax)) :
invalid number of intervals
target
is the categorical variable that I want to predict while I would force the first split of the classification tree to be done according to split.variable
(categorical too). Due to the object characteristics, indeed, if split.variable
is 1 target can be only 1, while if it is 0, target
can be or 0 or 1.
Initially I treated them as factors but I changed them to numeric and then rounded (as suggested in other posts in SO). Unfortunately, none of these solutions were helpful.
I played a bit with the data, subsampling cols and rows but still it doesn't work.
What am I missing?
Here is an MRE to replicate the error:
library(partykit)
tdf = structure(list(target = c(0, 0, 0, 1, 0, 0, 1, 1, 1, 1), split.variable = c(0,
0, 0, 0, 1, 0, 0, 0, 0, 0), var1 = c(2.021, 1.882, 1.633, 3.917,
2.134, 1.496, 1.048, 1.552, 1.65, 3.112), var2 = c(97.979, 98.118,
98.367, 96.083, 97.866, 98.504, 98.952, 98.448, 98.35, 96.888
), var3 = c(1, 1, 1, 0.98, 1, 1, 1, 1, 1, 1), var4 = c(1, 1,
1, 0.98, 1, 1, 1, 1, 1, 1), var5 = c(18.028, 25.207, 20.788,
28.548, 18.854, 19.984, 27.352, 24.622, 25.037, 24.067), var6 = c(0.213,
0.244, 0.289, 0.26, 0.887, 0.575, 0.097, 0.054, 0.104, 0.096),
var7 = c(63.22, 59.845, 62.45, 63.48, 52.143, 51.256, 56.296,
57.494, 59.543, 68.434), var8 = c(0.748, 0.795, 0.807, 0.793,
0.901, 0.909, 0.611, 0.61, 0.618, 0.589)), row.names = c(6L,
7L, 8L, 9L, 11L, 12L, 15L, 16L, 17L, 18L), class = "data.frame")
tr1 <- ctree(target ~ split.variable, data = tdf, maxdepth = 1)
tr2 <- ctree(target ~ split.variable + ., data = tdf, subset = predict(tr1, type = "node") == 2)
Upvotes: 2
Views: 365
Reputation: 17183
Your data set is too small to do what you want:
tr1
does not lead to any splits but produces a tree with a single root node.predict(tr1, type = "node")
produces a vector of 10 times 1
.subset
with predict(tr1, type = "node") == 2
is empty (all FALSE
).Additionally: I'm not sure where you found the recommendation to use numeric codings of categorical variables. But for partykit
you are almost always better off coding categorical variables appropriately as factor
variables.
Upvotes: 2