Reputation: 3
I have moderate experience with R. I am trying to run a Cox regression with a for loop using the survival package. My dataframe (df1) contains multiple health outcomes as "events". I want to regress "FA_low" on health outcomes and time, adding age sex and pc1-pc10 as covariates.
This is a subset of the dataframe (df1) that I generated using dput(df1[1:2, -c(3,4)]
:
structure(list(id = c("1000016", "1000028"), FA_low = c("1",
"1"), sex = c("F", "F"), age = c(56L, 66L), pc1 = c(125.117,
-9.61593), pc2 = c(-67.8548, 5.7494), pc3 = c(57.7852, -1.71108
), pc4 = c(7.68796, -4.73091), pc5 = c(0.445619, -3.22911), pc6 = c(2.93785,
-0.0760323), pc7 = c(7.02217, 2.93723), pc8 = c(4.40888, 0.982279
), pc9 = c(-0.704416, -0.161818), pc10 = c(5.46248, -0.579022
), time = c(5, 5), '250' = c(FALSE, FALSE), '250.2' = c(FALSE,
FALSE), '250.23' = c(FALSE, FALSE), '272' = c(NA, FALSE), '272.1' = c(NA,
FALSE), '272.11' = c(NA, FALSE), '274.1' = c(FALSE, FALSE), '278' = c(FALSE,
FALSE), '278.1' = c(FALSE, FALSE), '351' = c(FALSE, FALSE), `'401' = c(NA,
FALSE), '401.1' = c(NA, FALSE), '411' = c(NA, FALSE), '411.4' = c(NA,
FALSE), '411.8' = c(NA, FALSE), '454' = c(FALSE, FALSE), '454.1' = c(FALSE,
FALSE), '512.7' = c(FALSE, FALSE), '550' = c(NA, FALSE), '550.2' = c(NA,
FALSE), '550.4' = c(NA, FALSE), '740' = c(NA, FALSE), '740.1' = c(NA,
FALSE), '907' = c(FALSE, FALSE)), row.names = 1:2, class = "data.frame")
Structure:
'data.frame': 426295 obs. of 41 variables:
$ id : chr "1000016" "1000028" "1000033" "1000042" ...
$ FA_low : chr "1" "1" "0" "0" ...
$ sex : chr "F" "F" "F" "F" ...
$ age : int 56 66 64 50 69 63 42 41 62 64 ...
$ pc1 : num 125.12 -9.62 -12.53 -12.29 -11.33
$ time : num 5 5 5 5 5 5 5 5 5 5 ...
$ 250 : logi FALSE FALSE FALSE NA FALSE FALSE ..
.
When I run my analysis without a loop for each health outcome separately, it works fine. When I try to create a for loop with the health outcomes as iterations as follows:
for(i in 1:24){
df.model<-na.omit(df1[c(1:17,17+i)])
cox.mod <- coxph( Surv(time, i) ~ FA_low + age + sex + pc1 + pc2 + pc3 + pc4 + pc5 + pc6 + pc7 + pc8 + pc9 + pc10, data = df.model)
cox1 <- summary(cox.mod)
I get the following error:
Error in Surv(time, i) : Time and status are different lengths
The number of observations in these columns is the same. I am inclined to think that my for loop does not match the way the Surv() function works. I went through the documentation for the Surv() package but I still can't solve this. I have seen questions and answers regarding for loops for 'time' but not events. How do I create a for loop that works with iterations for events in this survival analysis?
Upvotes: 0
Views: 356
Reputation: 4140
I think the error you're seeing is related to how Surv()
expects its arguments to be formatted within coxph()
. It expects column names as variables rather than their position (i.e. your use of i
). One solution is to call values of each status
directly. Check this out:
library(survival)
#> Warning: package 'survival' was built under R version 4.0.5
test1 <- list(time=c(4,3,1,1,2,2,3),
status=c(1,1,1,0,1,1,0),
x=c(0,2,1,1,1,0,0),
sex=c(0,0,0,0,1,1,1),
status2=c(TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE))
## This works
coxph(Surv(time, status) ~ x + strata(sex), test1)
#> Call:
#> coxph(formula = Surv(time, status) ~ x + strata(sex), data = test1)
#>
#> coef exp(coef) se(coef) z p
#> x 0.8023 2.2307 0.8224 0.976 0.329
#>
#> Likelihood ratio test=1.09 on 1 df, p=0.2971
#> n= 7, number of events= 5
## This doesn't work
coxph(Surv(time, 2) ~ x + strata(sex), test1)
#> Error in Surv(time, 2): Time and status are different lengths
## This works
coxph(Surv(time, test1[[2]]) ~ x + strata(sex), test1)
#> Call:
#> coxph(formula = Surv(time, test1[[2]]) ~ x + strata(sex), data = test1)
#>
#> coef exp(coef) se(coef) z p
#> x 0.8023 2.2307 0.8224 0.976 0.329
#>
#> Likelihood ratio test=1.09 on 1 df, p=0.2971
#> n= 7, number of events= 5
Created on 2021-09-01 by the reprex package (v2.0.1)
Note that in my example (from the survival documentation), test1
is a list. You may need to use df.model[,i]
or convert df.model
to a list. Also, should i
in Surv()
always be 18
, as the 18th column contains your event data in every iteration of df.model
?
Upvotes: 1