Marco
Marco

Reputation: 9624

Fitting a gamma frailty model in R (coxph)

I have a data set with 6 clusters, each containing 48 (possibly censored, in which case event = 0) survival times. The x column contains a cluster-specific explanatory variable. I try to describe that data with a gamma frailty model as follows

 library(survival)

 mod <- coxph(Surv(time, event) ~ 
   x + frailty.gamma(cluster, eps=1e-10, method="em", sparse=0),
              outer.max=1000, iter.max=10000,
              data=data)

Here is the error message:

Error in if (history[2, 3] < (history[1, 3] + 1)) theta <- mean(history[1:2,  : 
  missing value where TRUE/FALSE needed

Does anyone have an idea on how to debug?

Upvotes: 9

Views: 3463

Answers (3)

agstudy
agstudy

Reputation: 121626

Alternative solution : Changing the variance factor

Changing the method of the variance of the random effect, seems to fix the problem.

e.g:

mod.aic <- coxph(Surv(time, event) ~ 
               x + frailty.gamma(cluster, eps=1e-10, method="aic", sparse=0),
             outer.max=1000, iter.max=10000,
             data=dat)

plot(survfit(mod.aic), col=4)

enter image description here

ddd hoc solution: works if we remove one cluster

Maybe this don't answer exactly your question , but when I remove any cluster e.g:

par(mfrow=c(2,3))
res <- sapply( 1:6 , function(x) {
                      mod <- 
                        coxph(Surv(time, event) ~ 
                        x + frailty.gamma(cluster, eps=1e-10, method="em", sparse=0),
                        outer.max=1000, iter.max=10000,
                        data=subset(dat,cluster != x)
                        )
                     plot(survfit(mod), col=4,main= paste ('cluster', x, 'is removed'))
                     legend(10,1,mod$iter)
              })

the coxph converge and I have the same result for all samples.

enter image description here

I don't have enough information about your data for further analysis but I Tried to do some comparison between different clusters.

library(ggplot2
qplot(data = dat, x=time , y = x , facets= event~cluster)

enter image description here

I notice that 3 groups :

  1. clusters 1,3,5 : events uniformally distributed
  2. clusters 2 ,4 : events just for small times.
  3. cluster 6 : amazing one ( only event 1)

Upvotes: 8

Marco
Marco

Reputation: 9624

Here is the answer I was given by Terry Therneau (the author of coxph).

I looked at your data:

> table(x, cluster)
     1  2  3  4  5  6
  0  0 48  0 48 48  0
  1 48  0 48  0  0 48

Your covariate "x" is perfectly predicted by the cluster variable. If you fit a fixed effects model: coxph(Surv(time, event) ~ factor(cluster) +x)

then the "x" variable is declared redundant. When the variance of the random effect is sufficiently large, the same happens in the gamma model when the variance is sufficiently large. Your model approaches this limit, and the solution fails. As mentioned in the manual page, the coxme function is now preferred.

Last, your particular error message is caused by an invalid value for "sparse". I'll add a check to the program. You likely want "sparse=10" to force non-sparse computation.

Upvotes: 3

nograpes
nograpes

Reputation: 18323

The problem is with the data; you cannot separate cluster-specific effects from your x if all of the x are the same in each cluster.

Looking at the distribution of x in your data by cluster we can see this:

table(data$x,data$cluster)
     1  2  3  4  5  6
  0  0 48  0 48 48  0
  1 48  0 48  0  0 48

Which is I think what you mean by cluster-specific explanatory variable. This will be a problem in any model because x is collinear (I think that is the word) with cluster. Even trying the most basic model:

data$cluster<-as.factor(data$cluster)
mod <- coxph(Surv(time, event) ~ x + cluster, data=data)

Warning message:
In coxph(Surv(time, event) ~ x + cluster, data = data) :
  X matrix deemed to be singular; variable 5

The matrix is singular because there is no way to differentiate between the effects of cluster and x.

If you have no other variables besides cluster and x, then all you can really do is run the effect of the cluster alone:

data$cluster<-as.factor(data$cluster)
coxph(Surv(time, event) ~ cluster,data=data)

Call:
coxph(formula = Surv(time, event) ~ cluster, data = data)


          coef exp(coef) se(coef)     z       p
cluster2 1.070      2.92    0.382  2.80 5.1e-03
cluster3 0.499      1.65    0.384  1.30 1.9e-01
cluster4 1.705      5.50    0.365  4.68 2.9e-06
cluster5 2.058      7.83    0.370  5.56 2.7e-08
cluster6 4.415     82.69    0.399 11.06 0.0e+00

Consider that both cluster1 and cluster6 have the same value of x, and the hazard ratio between them is 83. Perhaps cluster6 was different, perhaps x acts differently within cluster6: you can't tell the difference because of the way the data is structured.

Upvotes: 4

Related Questions