How to resolve discrepancy in 95% confidence intervals in survival function in R

Question

I'm in the process of writing some functions to extract information from the results of a survival analysis and I ran into a discrepancy between my extraction of the lower and upper survival time as specified by the 95% confidence interval and that which is reported from the package itself as a summary.

I'm using the survival package (v 2.37-7) in R (v 3.1.2).

So my problem is that sometimes my extraction of the lower and/or upper boundary of the 95% CI for the median survival time does not match what is returned when I just evaluate the results of survfit. When I inspect the data, I believe the results of survfit are wrong, it appears to be returning the boundary+1 value (again, sometimes). Here are some data that illustrate the problem.

# Fit my data stratified by gender of subject
survFit30Sex <- survfit(Surv(thirtyDaySuicides$daysFromInvestigation) ~ thirtyDaySuicides$Sex)

# Display median survival and confidence interval
survFit30Sex


Call: survfit(formula = Surv(thirtyDaySuicides$daysFromInvestigation) ~ 
thirtyDaySuicides$Sex)

                    records n.max n.start events median 0.95LCL 0.95UCL
thirtyDaySuicides$Sex=1      35    35      35     35     15       9      20
thirtyDaySuicides$Sex=2      93    93      93     93      9       6      13

survfit determines the lower and upper boundary for Sex = 1 as 9 days and 20 days respectively but when I inspect the data, it seems that the upper boundary should be 19, not 20

Here is the actual data; I'm just showing for Sex=1 as that is where the discrepancy is, I've also cut out values well before and after the critical region to make the data easier to read

Call: survfit(formula = Surv(thirtyDaySuicides$daysFromInvestigation) ~ 
    thirtyDaySuicides$Sex)

summary( thirtyDaySuicides$Sex=1 )
     time n.risk n.event survival std.err lower 95% CI upper 95% CI
    9     24       2   0.6286  0.0817      0.48725        0.811
   10     22       1   0.6000  0.0828      0.45780        0.786
   11     21       1   0.5714  0.0836      0.42890        0.761
   13     20       1   0.5429  0.0842      0.40055        0.736
   14     19       1   0.5143  0.0845      0.37272        0.710
   15     18       1   0.4857  0.0845      0.34541        0.683
   16     17       1   0.4571  0.0842      0.31861        0.656
   17     16       3   0.3714  0.0817      0.24138        0.572
   19     13       1   0.3429  0.0802      0.21673        0.542
   20     12       2   0.2857  0.0764      0.16921        0.482
   21     10       2   0.2286  0.0710      0.12437        0.420
   22      8       1   0.2000  0.0676      0.10310        0.388

As I understand it, the lower 95% CI for the median survival time is 0.34541. Searching down the survival column until finding a value < 0.34541 occurs in the row associated with a survival time of 19 (survival = 0.3429). Isn't this the upper bound? Why does survfit return an upper survival time of 20? I've automated this algorithm and most of the time I match the output from survfit but not always.

This leads me to think that either there is some strange error in the survival package (which I doubt), or I'm finding the boundary incorrectly (most likely).

--------- Update

Unfortunately I don't know how to link a data file to my question, but the data is pretty short so I can put it here. Note that I eliminated the stratification by Sex to simplify, thus this is just the data for females which is where I get the discrepancy.

It occurs to me that I was approaching this incorrectly, perhaps the 95% CI is being computed from the standard error, not looked up the way I'm thinking of it. But even with that idea I'm having similar problems. The question is more generally, how does one pull out the Xth percentile survival time with it's corresponding 95% CI in units of time from a survfit object?

Here is the survival input data via dput, and then an unstructured copy below that.

structure(list(daysFromInvestigation = c(27L, 27L, 10L, 20L, 
15L, 21L, 27L, 1L, 9L, 22L, 29L, 14L, 4L, 19L, 7L, 3L, 2L, 7L, 
21L, 4L, 17L, 20L, 16L, 2L, 9L, 7L, 17L, 2L, 17L, 26L, 25L, 11L, 
3L, 13L, 27L), censored = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1)), class = "data.frame", row.names = c(NA, -35L), .Names = c("daysFromInvestigation", 
"censored"))

       daysFromInvestigation censored
1                     27        1
2                     27        1
3                     10        1
4                     20        1
5                     15        1
6                     21        1
7                     27        1
8                      1        1
9                      9        1
10                    22        1
11                    29        1
12                    14        1
13                     4        1
14                    19        1
15                     7        1
16                     3        1
17                     2        1
18                     7        1
19                    21        1
20                     4        1
21                    17        1
22                    20        1
23                    16        1
24                     2        1
25                     9        1
26                     7        1
27                    17        1
28                     2        1
29                    17        1
30                    26        1
31                    25        1
32                    11        1
33                     3        1
34                    13        1
35                    27        1

How to resolve discrepancy in 95% confidence intervals in survival function in R

Answers (1)

Related Questions