Nico
Nico

Reputation: 211

invalid number of intervals with partykit decision trees

I'm trying to replicate the procedure proposed here on my data but I get the following error:

Error in interval.numeric(x, breaks = c(xmin - tol, ux, xmax)) : 
  invalid number of intervals

target is the categorical variable that I want to predict while I would force the first split of the classification tree to be done according to split.variable (categorical too). Due to the object characteristics, indeed, if split.variable is 1 target can be only 1, while if it is 0, target can be or 0 or 1. Initially I treated them as factors but I changed them to numeric and then rounded (as suggested in other posts in SO). Unfortunately, none of these solutions were helpful. I played a bit with the data, subsampling cols and rows but still it doesn't work. What am I missing?

Here is an MRE to replicate the error:

library(partykit)

tdf = structure(list(target = c(0, 0, 0, 1, 0, 0, 1, 1, 1, 1), split.variable = c(0, 
0, 0, 0, 1, 0, 0, 0, 0, 0), var1 = c(2.021, 1.882, 1.633, 3.917, 
2.134, 1.496, 1.048, 1.552, 1.65, 3.112), var2 = c(97.979, 98.118, 
98.367, 96.083, 97.866, 98.504, 98.952, 98.448, 98.35, 96.888
), var3 = c(1, 1, 1, 0.98, 1, 1, 1, 1, 1, 1), var4 = c(1, 1, 
1, 0.98, 1, 1, 1, 1, 1, 1), var5 = c(18.028, 25.207, 20.788, 
28.548, 18.854, 19.984, 27.352, 24.622, 25.037, 24.067), var6 = c(0.213, 
0.244, 0.289, 0.26, 0.887, 0.575, 0.097, 0.054, 0.104, 0.096), 
    var7 = c(63.22, 59.845, 62.45, 63.48, 52.143, 51.256, 56.296, 
    57.494, 59.543, 68.434), var8 = c(0.748, 0.795, 0.807, 0.793, 
    0.901, 0.909, 0.611, 0.61, 0.618, 0.589)), row.names = c(6L, 
7L, 8L, 9L, 11L, 12L, 15L, 16L, 17L, 18L), class = "data.frame")

tr1 <- ctree(target ~ split.variable,     data = tdf, maxdepth = 1)
tr2 <- ctree(target ~ split.variable + ., data = tdf, subset = predict(tr1, type = "node") == 2)

Upvotes: 2

Views: 365

Answers (1)

Achim Zeileis
Achim Zeileis

Reputation: 17183

Your data set is too small to do what you want:

  • With just 10 observations tr1 does not lead to any splits but produces a tree with a single root node.
  • Consequently, predict(tr1, type = "node") produces a vector of 10 times 1.
  • Thus, the subset with predict(tr1, type = "node") == 2 is empty (all FALSE).
  • This leads to an (admittedly cryptic) error message, reflecting that you cannot learn a tree from an empty data set.

Additionally: I'm not sure where you found the recommendation to use numeric codings of categorical variables. But for partykit you are almost always better off coding categorical variables appropriately as factor variables.

Upvotes: 2

Related Questions