Reputation: 9624
I have a data set with 6 clusters, each containing 48 (possibly censored, in which case event = 0
) survival times. The x
column contains a cluster-specific explanatory variable. I try to describe that data with a gamma frailty model as follows
library(survival)
mod <- coxph(Surv(time, event) ~
x + frailty.gamma(cluster, eps=1e-10, method="em", sparse=0),
outer.max=1000, iter.max=10000,
data=data)
Here is the error message:
Error in if (history[2, 3] < (history[1, 3] + 1)) theta <- mean(history[1:2, :
missing value where TRUE/FALSE needed
Does anyone have an idea on how to debug?
Upvotes: 9
Views: 3463
Reputation: 121626
Changing the method of the variance of the random effect, seems to fix the problem.
e.g:
mod.aic <- coxph(Surv(time, event) ~
x + frailty.gamma(cluster, eps=1e-10, method="aic", sparse=0),
outer.max=1000, iter.max=10000,
data=dat)
plot(survfit(mod.aic), col=4)
Maybe this don't answer exactly your question , but when I remove any cluster e.g:
par(mfrow=c(2,3))
res <- sapply( 1:6 , function(x) {
mod <-
coxph(Surv(time, event) ~
x + frailty.gamma(cluster, eps=1e-10, method="em", sparse=0),
outer.max=1000, iter.max=10000,
data=subset(dat,cluster != x)
)
plot(survfit(mod), col=4,main= paste ('cluster', x, 'is removed'))
legend(10,1,mod$iter)
})
the coxph converge and I have the same result for all samples.
I don't have enough information about your data for further analysis but I Tried to do some comparison between different clusters.
library(ggplot2
qplot(data = dat, x=time , y = x , facets= event~cluster)
I notice that 3 groups :
Upvotes: 8
Reputation: 9624
Here is the answer I was given by Terry Therneau (the author of coxph).
I looked at your data:
> table(x, cluster)
1 2 3 4 5 6
0 0 48 0 48 48 0
1 48 0 48 0 0 48
Your covariate "x" is perfectly predicted by the cluster variable. If you fit a fixed effects model: coxph(Surv(time, event) ~ factor(cluster) +x)
then the "x" variable is declared redundant. When the variance of the random effect is sufficiently large, the same happens in the gamma model when the variance is sufficiently large. Your model approaches this limit, and the solution fails. As mentioned in the manual page, the coxme function is now preferred.
Last, your particular error message is caused by an invalid value for "sparse". I'll add a check to the program. You likely want "sparse=10" to force non-sparse computation.
Upvotes: 3
Reputation: 18323
The problem is with the data; you cannot separate cluster-specific effects from your x
if all of the x
are the same in each cluster
.
Looking at the distribution of x
in your data by cluster we can see this:
table(data$x,data$cluster)
1 2 3 4 5 6
0 0 48 0 48 48 0
1 48 0 48 0 0 48
Which is I think what you mean by cluster-specific explanatory variable. This will be a problem in any model because x
is collinear (I think that is the word) with cluster
. Even trying the most basic model:
data$cluster<-as.factor(data$cluster)
mod <- coxph(Surv(time, event) ~ x + cluster, data=data)
Warning message:
In coxph(Surv(time, event) ~ x + cluster, data = data) :
X matrix deemed to be singular; variable 5
The matrix is singular because there is no way to differentiate between the effects of cluster
and x
.
If you have no other variables besides cluster
and x
, then all you can really do is run the effect of the cluster alone:
data$cluster<-as.factor(data$cluster)
coxph(Surv(time, event) ~ cluster,data=data)
Call:
coxph(formula = Surv(time, event) ~ cluster, data = data)
coef exp(coef) se(coef) z p
cluster2 1.070 2.92 0.382 2.80 5.1e-03
cluster3 0.499 1.65 0.384 1.30 1.9e-01
cluster4 1.705 5.50 0.365 4.68 2.9e-06
cluster5 2.058 7.83 0.370 5.56 2.7e-08
cluster6 4.415 82.69 0.399 11.06 0.0e+00
Consider that both cluster1
and cluster6
have the same value of x
, and the hazard ratio between them is 83. Perhaps cluster6
was different, perhaps x
acts differently within cluster6
: you can't tell the difference because of the way the data is structured.
Upvotes: 4