Alex
Alex

Reputation: 2780

randomForestSRC - cumulative hazard per patient

I want to use random survival forests to predict a cumulative hazard for each patient and to predict a treatment by choosing the treatment with the minimum cumulative hazard. I think I am close, but I am not sure what I am getting for one of the outputs in the randomForestSRC package.

The data I am using is the GBSG2 breast cancer data. The patients either received hormone treatment or not.

Here is my code so far

#load data
library(TH.data)
data(GBSG2)

#test and train
smp_size <- floor(0.75 * nrow(GBSG2))
set.seed(123)
train_ind <- sample(seq_len(nrow(GBSG2)), size = smp_size)
train <- GBSG2[train_ind, ]
test <- GBSG2[-train_ind, ]

#rsf fit
library(randomForestSRC)
rf.fit <- rfsrc(formula = Surv(time,cens)~., ntree = 100,
                          data=train)
#rsf predict
rf.pred <- predict(rf.fit, test)

#rsf cumulative hazard
rf.pred$chf

enter image description here

I am a little confused about the output. I was assuming that for each patient I would have a cumulative hazard for treatment vs non-treatment. I am not sure why I have four values for each patient.

Upvotes: 1

Views: 374

Answers (1)

Udaya Kogalur
Udaya Kogalur

Reputation: 161

The dimension of rf.pred$chf will be [rf.pred$n] x [rf.pred$time.interest]. For information on the relevant terminal node statistics and ensembles, please refer to the Theory and Specifications section on our GitHub Page:

https://kogalur.github.io/randomForestSRC/

Upvotes: 1

Related Questions