Error in R survival analysis using for loop for events

Question

I have moderate experience with R. I am trying to run a Cox regression with a for loop using the survival package. My dataframe (df1) contains multiple health outcomes as "events". I want to regress "FA_low" on health outcomes and time, adding age sex and pc1-pc10 as covariates.

This is a subset of the dataframe (df1) that I generated using dput(df1[1:2, -c(3,4)]:

structure(list(id = c("1000016", "1000028"), FA_low = c("1", 
"1"), sex = c("F", "F"), age = c(56L, 66L), pc1 = c(125.117, 
-9.61593), pc2 = c(-67.8548, 5.7494), pc3 = c(57.7852, -1.71108
), pc4 = c(7.68796, -4.73091), pc5 = c(0.445619, -3.22911), pc6 = c(2.93785, 
-0.0760323), pc7 = c(7.02217, 2.93723), pc8 = c(4.40888, 0.982279
), pc9 = c(-0.704416, -0.161818), pc10 = c(5.46248, -0.579022
), time = c(5, 5), '250' = c(FALSE, FALSE), '250.2' = c(FALSE, 
FALSE), '250.23' = c(FALSE, FALSE), '272' = c(NA, FALSE), '272.1' = c(NA, 
FALSE), '272.11' = c(NA, FALSE), '274.1' = c(FALSE, FALSE), '278' = c(FALSE, 
FALSE), '278.1' = c(FALSE, FALSE), '351' = c(FALSE, FALSE), `'401' = c(NA, 
FALSE), '401.1' = c(NA, FALSE), '411' = c(NA, FALSE), '411.4' = c(NA, 
FALSE), '411.8' = c(NA, FALSE), '454' = c(FALSE, FALSE), '454.1' = c(FALSE, 
FALSE), '512.7' = c(FALSE, FALSE), '550' = c(NA, FALSE), '550.2' = c(NA, 
FALSE), '550.4' = c(NA, FALSE), '740' = c(NA, FALSE), '740.1' = c(NA, 
FALSE), '907' = c(FALSE, FALSE)), row.names = 1:2, class = "data.frame")

Structure:

'data.frame':   426295 obs. of  41 variables:
 $ id             : chr  "1000016" "1000028" "1000033" "1000042" ...
 $ FA_low         : chr  "1" "1" "0" "0" ...
 $ sex            : chr  "F" "F" "F" "F" ...
 $ age            : int  56 66 64 50 69 63 42 41 62 64 ...
 $ pc1            : num  125.12 -9.62 -12.53 -12.29 -11.33
 $ time           : num  5 5 5 5 5 5 5 5 5 5 ...
 $ 250            : logi  FALSE FALSE FALSE NA FALSE FALSE ..

.

When I run my analysis without a loop for each health outcome separately, it works fine. When I try to create a for loop with the health outcomes as iterations as follows:

for(i in 1:24){ df.model<-na.omit(df1[c(1:17,17+i)])

cox.mod <- coxph( Surv(time, i) ~ FA_low + age + sex + pc1 + pc2 + pc3 + pc4 + pc5 + pc6 + pc7 + pc8 + pc9 + pc10, data = df.model)

cox1 <- summary(cox.mod)

I get the following error: Error in Surv(time, i) : Time and status are different lengths

The number of observations in these columns is the same. I am inclined to think that my for loop does not match the way the Surv() function works. I went through the documentation for the Surv() package but I still can't solve this. I have seen questions and answers regarding for loops for 'time' but not events. How do I create a for loop that works with iterations for events in this survival analysis?

Skaqqs · Accepted Answer

I think the error you're seeing is related to how Surv() expects its arguments to be formatted within coxph(). It expects column names as variables rather than their position (i.e. your use of i). One solution is to call values of each status directly. Check this out:

library(survival)
#> Warning: package 'survival' was built under R version 4.0.5

test1 <- list(time=c(4,3,1,1,2,2,3), 
              status=c(1,1,1,0,1,1,0), 
              x=c(0,2,1,1,1,0,0), 
              sex=c(0,0,0,0,1,1,1),
              status2=c(TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE)) 

## This works

coxph(Surv(time, status) ~ x + strata(sex), test1)
#> Call:
#> coxph(formula = Surv(time, status) ~ x + strata(sex), data = test1)
#> 
#>     coef exp(coef) se(coef)     z     p
#> x 0.8023    2.2307   0.8224 0.976 0.329
#> 
#> Likelihood ratio test=1.09  on 1 df, p=0.2971
#> n= 7, number of events= 5

## This doesn't work

coxph(Surv(time, 2) ~ x + strata(sex), test1)
#> Error in Surv(time, 2): Time and status are different lengths

## This works

coxph(Surv(time, test1[[2]]) ~ x + strata(sex), test1)
#> Call:
#> coxph(formula = Surv(time, test1[[2]]) ~ x + strata(sex), data = test1)
#> 
#>     coef exp(coef) se(coef)     z     p
#> x 0.8023    2.2307   0.8224 0.976 0.329
#> 
#> Likelihood ratio test=1.09  on 1 df, p=0.2971
#> n= 7, number of events= 5
Created on 2021-09-01 by the reprex package (v2.0.1)

Note that in my example (from the survival documentation), test1 is a list. You may need to use df.model[,i] or convert df.model to a list. Also, should i in Surv() always be 18, as the 18th column contains your event data in every iteration of df.model?

Error in R survival analysis using for loop for events

Answers (1)

Related Questions