Harlan Nelson
Harlan Nelson

Reputation: 1502

What does a tidymodels survival fit require more than one predictor

Can a survival model with just the treatment as a predictor be fit with a tidymodels survival function?

Here I mention the example, which uses many predictors, then try to duplicated it with only one predictor. This fails.

https://www.tidyverse.org/blog/2021/11/survival-analysis-parsnip-adjacent/ has code to fit a survival tidymodel

library(survival)

bladder_train <- bladder[-c(1:3),]
bladder_test <- bladder[1:3,]

cox_spec <- proportional_hazards(penalty = 0.123) %>%
    set_engine("glmnet")

f_fit <- fit(cox_spec,
             Surv(stop, event) ~ rx + size + number + strata(enum),
             data = bladder_train)

But with only the treatment in the model, it does not work

f_fit <- fit(cox_spec,
             Surv(stop, event) ~ rx,
             data = bladder_train)

Why? What am I missing

Upvotes: 0

Views: 202

Answers (1)

Desmond
Desmond

Reputation: 1137

It seems the error has more to do with glmnet than tidymodels. This is the error:

library(survival)
library(censored)
#> Loading required package: parsnip
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip

bladder_train <- bladder[-c(1:3), ]
bladder_test <- bladder[1:3, ]

cox_spec <- proportional_hazards(penalty = 0.123) %>%
  set_engine("glmnet")

f_fit <- fit(cox_spec,
    Surv(stop, event) ~ rx,
    data = bladder_train)
#> Error in glmnet::glmnet(data_obj$x, data_obj$y, family = "cox", alpha = alpha, : x should be a matrix with 2 or more columns

Created on 2021-12-30 by the reprex package (v2.0.1)

glmnet needs a mtrix with 2 or more columns. Using just rx means you'd have just one column. If I add size as an additional feature, it works just fine.

library(survival)
library(censored)
#> Loading required package: parsnip
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip

bladder_train <- bladder[-c(1:3), ]
bladder_test <- bladder[1:3, ]

cox_spec <- proportional_hazards(penalty = 0.123) %>%
  set_engine("glmnet")

f_fit <- fit(cox_spec,
    Surv(stop, event) ~ rx + size,
    data = bladder_train)

f_fit
#> parsnip model object
#> 
#> Fit time:  NA 
#> 
#> Call:  glmnet::glmnet(x = data_obj$x, y = data_obj$y, family = "cox",      alpha = alpha, lambda = lambda) 
#> 
#>    Df %Dev   Lambda
#> 1   0 0.00 0.070850
#> 2   1 0.10 0.064560
#> 3   1 0.18 0.058820
#> 4   1 0.24 0.053600
#> 5   1 0.30 0.048840
#> 6   2 0.35 0.044500
#> 7   2 0.43 0.040550
#> 8   2 0.50 0.036940
#> 9   2 0.55 0.033660
#> 10  2 0.60 0.030670
#> 11  2 0.64 0.027950
#> 12  2 0.67 0.025460
#> 13  2 0.70 0.023200
#> 14  2 0.72 0.021140
#> 15  2 0.74 0.019260
#> 16  2 0.75 0.017550
#> 17  2 0.77 0.015990
#> 18  2 0.78 0.014570
#> 19  2 0.79 0.013280
#> 20  2 0.79 0.012100
#> 21  2 0.80 0.011020
#> 22  2 0.81 0.010040
#> 23  2 0.81 0.009151
#> 24  2 0.81 0.008338
#> 25  2 0.82 0.007597
#> 26  2 0.82 0.006922
#> 27  2 0.82 0.006308
#> 28  2 0.82 0.005747
#> 29  2 0.82 0.005237
#> 30  2 0.83 0.004771
#> 31  2 0.83 0.004348
#> 32  2 0.83 0.003961
#> 33  2 0.83 0.003609
#> 34  2 0.83 0.003289
#> 35  2 0.83 0.002997
#> 36  2 0.83 0.002730
#> 37  2 0.83 0.002488
#> 38  2 0.83 0.002267
#> 39  2 0.83 0.002065
#> 40  2 0.83 0.001882
#> 41  2 0.83 0.001715
#> The training data has been saved for prediction.

Created on 2021-12-30 by the reprex package (v2.0.1)

If you wanted to just use one feature rx, consider other models e.g. decision trees

library(survival)
library(censored)
#> Loading required package: parsnip
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip

bladder_train <- bladder[-c(1:3), ]
bladder_test <- bladder[1:3, ]

dt_spec <- decision_tree() %>% 
  set_engine("rpart") %>% 
  set_mode("censored regression")

f_fit <- fit(dt_spec,
    Surv(stop, event) ~ rx,
    data = bladder_train)

f_fit
#> parsnip model object
#> 
#> $rpart
#> n= 337 
#> 
#> node), split, n, deviance, yval
#>       * denotes terminal node
#> 
#> 1) root 337 403.0968 1.0000000  
#>   2) rx>=1.5 152 166.6335 0.7751669 *
#>   3) rx< 1.5 185 231.2927 1.1946030 *
#> 
#> $survfit
#> 
#> Call: prodlim::prodlim(formula = form, data = data)
#> Stratified Kaplan-Meier estimator for the conditional event time survival function
#> Discrete predictor variable: rpartFactor (0.775166899958249, 1.19460305107131)
#> 
#> $levels
#> [1] "0.775166899958249" "1.19460305107131" 
#> 
#> attr(,"class")
#> [1] "pecRpart"

Created on 2021-12-30 by the reprex package (v2.0.1)

Upvotes: 2

Related Questions