Andy
Andy

Reputation: 31

Interval Survival Model in R with left-censored, right-censored and truncated data

I am a bit lost now. I have the following issue: We began data collection in 2018 for startups and when they had sales. Now I want to estiamte a model explaining the time between being founded and having the first sale.

As I understood, I have left censored data: sale was before we have a founding date, right censored data: was founded but has no sale yet and truncated: startups were founded at different time points.

The data looks like this:

founding_data <- c("2018-01-10", "2023-01-11", "2022-07-26", "2021-11-23", "2020-08-05")
time_first_sale <- c("2018-05-01", "2023-01-01", "2022-10-01", "2020-01-01", "NA")

I have run the following code for data preparation and could estimate a survival object.

# Convert dates to Date format
data$founding_date <- as.Date(data$founding_date, format="%d/%m/%Y")
data$first_sale_date <- as.Date(data$first_sale_date, format="%d/%m/%Y")

# Earliest founding date
data_collection_date <- min(data$founding_date, na.rm = TRUE)

# Start_time and end_time in days since study start
data <- data %>%
  mutate(
    start_time = as.numeric(difftime(founding_date, data_collection_date, units = "days")),
    end_time = as.numeric(difftime(first_sale_date, data_collection_date, units = "days")),
    
    # Right-censoring: If no sale has happened, status = 0
    status = ifelse(is.na(end_time), 0, 1),
    
    # Left-censoring: First sale before founding
    left_censored = ifelse(end_time < start_time, 1, 0)
  )

# Correct left-censored cases by setting start_time to NA
data$start_time[data$left_censored == 1] <- NA

# Correct invalid end_time cases where status = 1 but end_time <= 0
data$end_time[data$status == 1 & (is.na(data$end_time) | data$end_time <= 0)] <- NA

# Define Survival Object
Surv.Obj <- Surv(data$start_time, data$end_time, type = 'interval2')

The Survival Object looks like this for the data provided. Be aware that the data colletion date min was 2014-01-01, which i why there are so many days:

[1470, 1581] 
3287-        
[3128, 3195] 
2191-        
2408+              

However, whenever I run the following (no matter the distribution) I run into problems. The model just does not run. The predictors are all fully observed, no missings are present.

interval_model <- 
survreg(Surv.Obj ~ n_eco_players + founding_team_diversity + urban_rural_binary, 
                          data = data, 
                          dist = "lognormal")

The warning is:

Error in in survreg(Surv.Obj ~ n_eco_players + founding_team_diversity +  : 
  Invalid survival times for this distribution

If I need to provide something else, please let me know. I hope somebody can help me.

Upvotes: 1

Views: 28

Answers (0)

Related Questions