How to know if the data is censored in a survival analysis using r

Question

I have a data set that looks like this (a nonsense example):

id <- c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
year <- c(1990, 1991, 1992, 1989, 1990, 1991, 1992, 1993, 1989, 1990, 1992, 1993)
event<- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1) 

df <- cbind(id, year, event)

There are suppose to be continuous observations for all three id's between 1989 until death. However, as you can see id 1 is left-censored (no information from start), id 2 is right-censored (no info from start or finish), and id 3 have gaps in observation (info from start and finish but with gaps). In a small table this is easy to see, but when dealing with large data sets it becomes more difficult.

Edit: Is there a way of grouping by id and creating a summary table with information on the completeness of the data, something like:

id   left-censored   right-censored   gaps in obs. 
1    1               0                0             
2    0               1                0
3    0               0                1

Chris · Accepted Answer

You can group (I use dplyr) your data.frame (I employ tibble) by ID and then create new variables that indiciate whether or not for each ID the first year of observation was 1989, whether the person died under observation and whether or not the number of rows per ID is equal to the time span (max_year - min_year + 1). In this case I would argue that ID 2 is not left censored, as her first year of observation is 1989 which you define as starting year.

library(tibble)
library(dplyr)


id <- c(1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3)
year <- c(1990, 1991, 1992, 1989, 1990, 1991, 1992, 1993, 1989, 1990, 1992, 1993)
deceased <- c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1) 

df <- tibble(id, year, deceased)

start_year <- 1989

df %>% group_by(id) %>% mutate(left_censored = min(year) > start_year,  ## left censored, if first year is after 1988
                               right_censored = max(deceased) == 0, ## right censored, if did not die within observation 
                               has_gaps = n() < max(year) - min(year) + 1) ## has gaps,

The result:

# A tibble: 12 x 6
# Groups:   id [3]
      id  year deceased left_censored right_censored has_gaps
                               
 1     1  1990        0 TRUE          FALSE          FALSE   
 2     1  1991        0 TRUE          FALSE          FALSE   
 3     1  1992        1 TRUE          FALSE          FALSE   
 4     2  1989        0 FALSE         TRUE           FALSE   
 5     2  1990        0 FALSE         TRUE           FALSE   
 6     2  1991        0 FALSE         TRUE           FALSE   
 7     2  1992        0 FALSE         TRUE           FALSE   
 8     2  1993        0 FALSE         TRUE           FALSE   
 9     3  1989        0 FALSE         FALSE          TRUE    
10     3  1990        0 FALSE         FALSE          TRUE    
11     3  1992        0 FALSE         FALSE          TRUE    
12     3  1993        1 FALSE         FALSE          TRUE

Edit: If you want an overview you can add:

df %>% group_by(id) %>% mutate(left_censored = min(year) > start_year,  ## left censored, if first year is after 1988
                                   right_censored = max(deceased) == 0, ## right censored, if did not die within observation 
                                   has_gaps = n() < max(year) - min(year) + 1) %>%## has gaps, 
      dplyr::distinct(id, left_censored, right_censored, has_gaps) %>%
      ungroup() %>%
      summarise(left_censored = sum(left_censored), right_censored = sum(right_censored), has_gaps = sum(has_gaps))

And get:

# A tibble: 1 x 3
  left_censored right_censored has_gaps
                        
1             1              1        1

As I mentioned before: Here ID 2 is not considered left censored, as her starting date is 1989.

Edit2: If you take away the ungroup() you get the overview you asked for:

df %>% group_by(id) %>% mutate(left_censored = min(year) > start_year,  ## left censored, if first year is after 1988
                               right_censored = max(deceased) == 0, ## right censored, if did not die within observation 
                               has_gaps = n() < max(year) - min(year) + 1) %>%## has gaps, 
  dplyr::distinct(id, left_censored, right_censored, has_gaps) %>%
  summarise(left_censored = sum(left_censored), right_censored = sum(right_censored), has_gaps = sum(has_gaps))

and get:

  id left_censored right_censored has_gaps
                         
1     1             1              0        0
2     2             0              1        0
3     3             0              0        1

How to know if the data is censored in a survival analysis using r

Answers (1)

Related Questions