Łukasz Deryło
Łukasz Deryło

Reputation: 1860

ctree ignores variables with non syntactic names?

I wonder if partkykit::ctree function ignores variables with non syntactic names or am I missing something?

Toy example:

myData<-data.frame(
   Y = factor(rep(LETTERS[1:3], each=10)),
   x1 = 1:30,
   x2 = c(1:10,2:11,3:12)
 )

Clearly x1 is the best "predictor" of Y:

ctree(Y~., data=myData)

Model formula:
Y ~ x1 + x2

Fitted party:
[1] root
|   [2] x1 <= 10: A (n = 10, err = 0,0%)
|   [3] x1 > 10
|   |   [4] x1 <= 20: B (n = 10, err = 0,0%)
|   |   [5] x1 > 20: C (n = 10, err = 0,0%)

Number of inner nodes:    2
Number of terminal nodes: 3

But when I change it's name to non syntactic one, it seems to be ignored in tree construction process:

 myData<-data.frame(
   Y = factor(rep(LETTERS[1:3], each=10)),
   `x 1` = 1:30,
   x2 = c(1:10,2:11,3:12),
   check.names = F
 )
 
ctree(Y~., data=myData)

Model formula:
Y ~ `x 1` + x2

Fitted party:
[1] root: A (n = 30, err = 66,7%) 

Number of inner nodes:    0
Number of terminal nodes: 1

Can you suggest any way to overcome this behaviour ('cos I really-really-really wish to use x 1 as a name, don't ask why)?

Upvotes: 1

Views: 126

Answers (1)

Achim Zeileis
Achim Zeileis

Reputation: 17168

Thanks for pointing this out. This was indeed a bug in partykit::ctree but has been fixed now in version 1.2-11 (the current development version on R-Forge).

Furthermore, if you just want the non-syntactic label to be used in printing and plotting you can use the following quick & dirty workaround: First learn the data with nice syntactic names.

myData <- data.frame(
  Y = factor(rep(LETTERS[1:3], each = 10)),
  x1 = 1:30,
  x2 = c(1:10, 2:11, 3:12)
)
ct <- ctree(Y ~ ., data = myData)

then after fitting the tree, change the name of the variable in the $data stored in the tree.

names(ct$data)[2] <- "x 1"

This is then used in printing and plotting.

print(ct)
## Model formula:
## Y ~ x1 + x2
## 
## Fitted party:
## [1] root
## |   [2] x 1 <= 10: A (n = 10, err = 0.0%)
## |   [3] x 1 > 10
## |   |   [4] x 1 <= 20: B (n = 10, err = 0.0%)
## |   |   [5] x 1 > 20: C (n = 10, err = 0.0%)
plot(ct)

ctree

Upvotes: 2

Related Questions