Reputation: 31
I am a bit lost now. I have the following issue: We began data collection in 2018 for startups and when they had sales. Now I want to estiamte a model explaining the time between being founded and having the first sale.
As I understood, I have left censored data: sale was before we have a founding date, right censored data: was founded but has no sale yet and truncated: startups were founded at different time points.
The data looks like this:
founding_data <- c("2018-01-10", "2023-01-11", "2022-07-26", "2021-11-23", "2020-08-05")
time_first_sale <- c("2018-05-01", "2023-01-01", "2022-10-01", "2020-01-01", "NA")
I have run the following code for data preparation and could estimate a survival object.
# Convert dates to Date format
data$founding_date <- as.Date(data$founding_date, format="%d/%m/%Y")
data$first_sale_date <- as.Date(data$first_sale_date, format="%d/%m/%Y")
# Earliest founding date
data_collection_date <- min(data$founding_date, na.rm = TRUE)
# Start_time and end_time in days since study start
data <- data %>%
mutate(
start_time = as.numeric(difftime(founding_date, data_collection_date, units = "days")),
end_time = as.numeric(difftime(first_sale_date, data_collection_date, units = "days")),
# Right-censoring: If no sale has happened, status = 0
status = ifelse(is.na(end_time), 0, 1),
# Left-censoring: First sale before founding
left_censored = ifelse(end_time < start_time, 1, 0)
)
# Correct left-censored cases by setting start_time to NA
data$start_time[data$left_censored == 1] <- NA
# Correct invalid end_time cases where status = 1 but end_time <= 0
data$end_time[data$status == 1 & (is.na(data$end_time) | data$end_time <= 0)] <- NA
# Define Survival Object
Surv.Obj <- Surv(data$start_time, data$end_time, type = 'interval2')
The Survival Object looks like this for the data provided. Be aware that the data colletion date min was 2014-01-01, which i why there are so many days:
[1470, 1581]
3287-
[3128, 3195]
2191-
2408+
However, whenever I run the following (no matter the distribution) I run into problems. The model just does not run. The predictors are all fully observed, no missings are present.
interval_model <-
survreg(Surv.Obj ~ n_eco_players + founding_team_diversity + urban_rural_binary,
data = data,
dist = "lognormal")
The warning is:
Error in in survreg(Surv.Obj ~ n_eco_players + founding_team_diversity + :
Invalid survival times for this distribution
If I need to provide something else, please let me know. I hope somebody can help me.
Upvotes: 1
Views: 28