Reputation: 4850
If I do DF$where <- tree$where
after fitting an rpart object using DF
as my data, will each row be mapped to its corresponding leaf through the column where
?
Thanks!
Upvotes: 1
Views: 866
Reputation: 263301
As an example of how to demonstrate that this is possibly true (modulo my understanding of your question being correct), we work with the first example in ?rpart
:
require(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
kyphosis$where <- fit$where
> str(kyphosis)
'data.frame': 81 obs. of 5 variables:
$ Kyphosis: Factor w/ 2 levels "absent","present": 1 1 2 1 1 1 1 1 1 2 ...
$ Age : int 71 158 128 2 1 1 61 37 113 59 ...
$ Number : int 3 3 4 5 4 2 2 3 2 6 ...
$ Start : int 5 14 5 1 15 16 17 16 16 12 ...
$ where : int 9 7 9 9 3 3 3 3 3 8 ...
> plot(fit)
> text(fit, use.n = TRUE)
And now look at some tables based on the 'where' vector and some logical tests:
First node:
> with(kyphosis, table(where, Start >= 8.5))
where FALSE TRUE
3 0 29
5 0 12
7 0 14
8 0 7
9 19 0 # so this is the row that describes that split
> fit$frame[9,]
var n wt dev yval complexity ncompete nsurrogate yval2.V1
3 <leaf> 19 19 8 2 0.01 0 0 2.0000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
3 8.0000000 11.0000000 0.4210526 0.5789474 0.2345679
Second node:
> with(kyphosis, table(where, Start >= 8.5, Start>=14.5))
, , = FALSE
where FALSE TRUE
3 0 0
5 0 12
7 0 14
8 0 7
9 19 0
, , = TRUE
where FALSE TRUE
3 0 29
5 0 0
7 0 0
8 0 0
9 0 0
And this is the row of fit$frame that describes the second split:
> fit$frame[3,]
var n wt dev yval complexity ncompete nsurrogate yval2.V1
4 <leaf> 29 29 0 1 0.01 0 0 1.0000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
4 29.0000000 0.0000000 1.0000000 0.0000000 0.3580247
So I would characterize the value of fit$where
as describing the "terminal nodes" which are being labeled as '<leaf>'
, which may or not be what you were calling the "nodes".
> fit$frame
var n wt dev yval complexity ncompete nsurrogate yval2.V1
1 Start 81 81 17 1 0.17647059 2 1 1.00000000
2 Start 62 62 6 1 0.01960784 2 2 1.00000000
4 <leaf> 29 29 0 1 0.01000000 0 0 1.00000000
5 Age 33 33 6 1 0.01960784 2 2 1.00000000
10 <leaf> 12 12 0 1 0.01000000 0 0 1.00000000
11 Age 21 21 6 1 0.01960784 2 0 1.00000000
22 <leaf> 14 14 2 1 0.01000000 0 0 1.00000000
23 <leaf> 7 7 3 2 0.01000000 0 0 2.00000000
3 <leaf> 19 19 8 2 0.01000000 0 0 2.00000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.nodeprob
1 64.00000000 17.00000000 0.79012346 0.20987654 1.00000000
2 56.00000000 6.00000000 0.90322581 0.09677419 0.76543210
4 29.00000000 0.00000000 1.00000000 0.00000000 0.35802469
5 27.00000000 6.00000000 0.81818182 0.18181818 0.40740741
10 12.00000000 0.00000000 1.00000000 0.00000000 0.14814815
11 15.00000000 6.00000000 0.71428571 0.28571429 0.25925926
22 12.00000000 2.00000000 0.85714286 0.14285714 0.17283951
23 3.00000000 4.00000000 0.42857143 0.57142857 0.08641975
3 8.00000000 11.00000000 0.42105263 0.57894737 0.23456790
Upvotes: 1