Reputation:
There is a great post about the interpretation of the predict.coxph()
output. However, I keep getting different results comparing the output from predict.coxph
, simPH
and the formula for relative risk. Since my hypothesis includes a quadratic effect, I am going to include a polynomial with power 2 in my example.
I use the example from this post.
data("lung")
Predicting relative risk with predict()
# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2
# conduct a cox regression with the predictor meal.cal, its quadratic version and some covariates.
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + meal.cal_q,
data = lung)
# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE), max(lung$meal.cal,
na.rm= TRUE), by= 1)
# a vector of fitted values to predict for, the quadratic effect
meal.cal_q_new <- meal.cal_new^2
# the length of the vector with the values to predict for
n <- length(meal.cal_new)
# a dataframe with all the values to predict for
lung_new <- data.frame(ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n),
pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n),
meal.cal= meal.cal_new,
meal.cal_q = meal.cal_q_new)
# predict the relative risk
lung_new$rel_risk <- predict(cox_mod, lung_new, type= "risk")
Predicting the relative risk with the formula (see the post mentioned above)
# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2
# run a cox regression with the predictor meal.cal, its quadratic version and some covariates.
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + meal.cal_q,
data = lung)
# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE), max(lung$meal.cal,
na.rm= TRUE), by= 1)
# a vector of fitted values to predict for, the quadratic effect
meal.cal_q_new <- meal.cal_new^2
# length of the vector to predict for
n <- length(meal.cal_new)
# A dataframe with the values to make the prediction for
lung_new2 <- data.frame(
ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n),
pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n),
meal.cal= meal.cal_new,
meal.cal_q = meal.cal_q_new)
# A dataframe with the values to compare the prediction with
lung_new_mean <- data.frame(
ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n),
pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n),
meal.cal= rep(mean(lung$meal.cal, na.rm= TRUE), n),
meal.cal_q = rep(mean(lung$meal.cal_q, na.rm= TRUE), n))
# extract the coefficients
coefCPH <- coef(cox_mod)
# make the prediction for the values of interest
cox_risk <-
exp(coefCPH["ph.karno"] * lung_new2[ , "ph.karno"] +
coefCPH["pat.karno"] * lung_new2[ , "pat.karno"] +
coefCPH["meal.cal"] * lung_new2[ , "meal.cal"] +
coefCPH["meal.cal_q"] * lung_new2[ , "meal.cal_q"])
# make the predictions for the values to compare with
cox_risk_mean <-
exp(coefCPH["ph.karno"] * lung_new_mean[ , "ph.karno"] +
coefCPH["pat.karno"] * lung_new_mean[ , "pat.karno"] +
coefCPH["meal.cal"] * lung_new_mean[ , "meal.cal"] +
coefCPH["meal.cal_q"] * lung_new_mean[ , "meal.cal_q"])
# calculate the relative risk
lung_new2$rel_risk <- unlist(cox_risk)/ unlist(cox_risk_mean)
Now the plot with the predicted relative risk using predict()
and using the formula:
ggplot(lung_new, aes(meal.cal, rel_risk)) +
geom_smooth() +
geom_smooth(data= lung_new2, col= "red")
The plot shows that the predictions are different. I do not understand why this is the case, although the mentioned post shows that the predict function and the formula should give the same result.
Because of this confusion I tried to solve the problem with the simPH
package. Here is what I did:
# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2
# run a cox regression with the predictor, its quadratic version and some covariates.
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + meal.cal_q,
data = lung)
# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE),
max(lung$meal.cal, na.rm= TRUE), by= 1)
# length of the vector to predict for
n <- length(meal.cal_new)
# A vector with the values to compare the prediction with
meal.cal_new_mean <- rep(mean(lung$meal.cal, na.rm= TRUE), n)
# running 100 simulations per predictor value with coxsimPoly
Sim <- coxsimPoly(obj= cox_mod, b = "meal.cal", pow = 2,
qi = "Relative Hazard",
Xj = meal.cal_new,
Xl = meal.cal_new_mean,
ci = .95,
nsim = 100,
extremesDrop = FALSE)
# plot the result
simGG(Sim)
This gives an empty plot with the warning
Warning messages:
1: In min(obj$sims[, x]) : no non-missing arguments to min; returning Inf
2: In max(obj$sims[, x]) : no non-missing arguments to max; returning -Inf
And the Sim$sims
object appears indeed to be empty.
My questions are:
Why do the results from predict()
and the use of the formula differ?
Why does the simPH
package not calculate the relative risk?
Which method should I choose? My hypothesis is a quadratic effect in a cox regression and I need a plot for this predictor with its relative risk (compared to the predictor being at its mean value), just like in the example.
Upvotes: 1
Views: 766
Reputation: 88
Quick answer to the simPH issue: the polynomial terms need to be specified in the coxph
call using the I
function, e.g.:
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + I(meal.cal^2),
data = lung)
(The error handling in your use case is pretty poor.)
Using this modification (and 1000 simulations) with your code above should return something like:
predict
My guess to the differences is that simPH doesn't create confidence intervals around the transformed point estimates like predict
. It draws simulations from the multivariate normal distribution specified by the fitted model, then shows the central 50% and 95% of this simulated distribution. The central line is just the median of the sims. It is explicitly a different logic from predict
. For very non-monotonic quantities of interest, like this one, predict
point estimates give highly substantively misleading results compared to simPH. There is little evidence for such a form based on 4 observations.
Upvotes: 1