Reputation: 31
I am working on competing risk analysis in R thanks to a Fine & Gray regression analysis. Here is my code with death as the competing risk:
fg.model <- crr(ftime,fstatus,cov,failcode=1,cencode=0)
ftime is a numerical variable ranging from 1 to 180 days that indicates the period of follow-up of patients until their death (fstatus==1). If they are still alive until the end of the follow-up, this variable is equal to 180 days and their status is equal to 0. In summary, If a person dies after 30 days of follow-up, the variable ftime will therefore be equal to 30 days and the variable fstatus to 1 and if a person is still alive after the end of the follow-up (max 180 days) and did not die during it, the variable ftime will therefore be equal to 180 and the variable fstatus to 0. fstatus is also a numerical variable.
The parameter "cov" is a dataframe with two covariates (age and sex converted in factors). Failcode is equal to 1 as death is the competing event and cencode is equal to 0 as survivors as considered as censored.
I have the following error message:
# NAs introduced by coercion Error in crr(ftime,fstatus,cov,failcode=1, :
# NAs introduced by coercion NA/NaN/inf in foreign function call(arg4)
Since I have no missing data in my database, what can explain this error message and how can I solve it?
I already tried to use na.omit
, complete.case
, and other code to be sure that there is no missing data in my code. I also check the structure of the data but time and status are well numerical and cov converted in factors.
Here is a code reproducing my dataset and the error message:
# Set the sample size
n<- 8076
# CReate a variable for follow-up time
time<- c(rep(180,6533),sample(1:179,n-6533,replace = TRUE))
# Create a variable for status
status<-ifelse(time==180,0,1) # O= alive/censored
# 1 = death
# Create age
age<-sample(18:90,n,replace = TRUE)
# Create gender
sex<-sample(c("male","female"),n,replace = TRUE)
# Combine
df<-data.frame(time,status,age,sex)
# Create cov
cov<-subset(df,select = c("age","sex"))
cov$sex<-as.factor(cov$sex)
# Run Fine Gray model
library(cmprsk)
fg.mod <-crr(df$time,df$status,cov,failcode = 1,cencode = 0)
Upvotes: 1
Views: 771
Reputation: 20492
model.matrix()
The solution to the question in your comment - what happens if you have a factor with several levels - is to do this:
# Create factor with three levels
cov$income <- factor(sample(c("high", "med", "low"), nrow(cov), replace = TRUE))
# Define factors for model spec
factors <- c("sex", "income")
model_spec <- reformulate(
paste0("age+", paste(factors, collapse = "+"))
) # ~age + sex + income
covariates_matrix <- model.matrix(
model_spec,
data = cov,
contrasts.arg = lapply(cov[factors], contrasts)
)[, -1] # first column is constant intercept (1)
head(covariates_matrix, 3)
# age sexmale incomelow incomemed
# 1 60 1 0 1
# 2 39 0 1 0
# 3 23 0 0 0
crr(df$time, df$status, covariates_matrix, failcode = 1, cencode = 0)
# convergence: TRUE
# coefficients:
# age sexmale incomelow incomemed
# 0.0002871 0.0506400 -0.0391500 -0.0098910
# standard errors:
# [1] 0.001193 0.050910 0.062340 0.062020
# two-sided p-values:
# age sexmale incomelow incomemed
# 0.81 0.32 0.53 0.87
As you can see, your factor
variables now have a coefficient for each level, rather than being treated as continuous.
The difficulty you have is that the cmprsk::crr()
function does not support model formulas. Instead it takes a matrix, which means that all covariates will be coerced to the same class. In your case, your factor
variables lead to everything being coerced to a character
, and NAs
introduced by coercion. As the cmprsk
docs state:
The model.matrix function can be used to generate suitable matrices of covariates from factors
In your original question, which had a binary factor, we could just do cov$sex <- cov$sex=="male"
, and get a binary covariate which is easy to interpret. However, with more levels, we need to use model.matrix()
. If you want to change how the reference level is represented, see this question.
Upvotes: 0