Reputation: 2780
I want to use random survival forests to predict a cumulative hazard for each patient and to predict a treatment by choosing the treatment with the minimum cumulative hazard. I think I am close, but I am not sure what I am getting for one of the outputs in the randomForestSRC
package.
The data I am using is the GBSG2
breast cancer data. The patients either received hormone treatment or not.
Here is my code so far
#load data
library(TH.data)
data(GBSG2)
#test and train
smp_size <- floor(0.75 * nrow(GBSG2))
set.seed(123)
train_ind <- sample(seq_len(nrow(GBSG2)), size = smp_size)
train <- GBSG2[train_ind, ]
test <- GBSG2[-train_ind, ]
#rsf fit
library(randomForestSRC)
rf.fit <- rfsrc(formula = Surv(time,cens)~., ntree = 100,
data=train)
#rsf predict
rf.pred <- predict(rf.fit, test)
#rsf cumulative hazard
rf.pred$chf
I am a little confused about the output. I was assuming that for each patient I would have a cumulative hazard for treatment vs non-treatment. I am not sure why I have four values for each patient.
Upvotes: 1
Views: 374
Reputation: 161
The dimension of rf.pred$chf will be [rf.pred$n] x [rf.pred$time.interest]. For information on the relevant terminal node statistics and ensembles, please refer to the Theory and Specifications section on our GitHub Page:
https://kogalur.github.io/randomForestSRC/
Upvotes: 1