user9992957
user9992957

Reputation:

Predicting relative risk with predict.coxph, simPH and the formula

There is a great post about the interpretation of the predict.coxph() output. However, I keep getting different results comparing the output from predict.coxph, simPH and the formula for relative risk. Since my hypothesis includes a quadratic effect, I am going to include a polynomial with power 2 in my example.

I use the example from this post.

data("lung")

Predicting relative risk with predict()

# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2

# conduct a cox regression with the predictor meal.cal, its quadratic version and some covariates.
cox_mod <- coxph(Surv(time, status) ~
                 ph.karno + pat.karno + meal.cal + meal.cal_q,
                 data = lung)

# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE), max(lung$meal.cal, 
na.rm= TRUE), by= 1)

# a vector of fitted values to predict for, the quadratic effect
meal.cal_q_new <- meal.cal_new^2

# the length of the vector with the values to predict for
n <- length(meal.cal_new)

# a dataframe with all the values to predict for
lung_new <- data.frame(ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n), 
                       pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n), 
                       meal.cal= meal.cal_new, 
                       meal.cal_q = meal.cal_q_new)

# predict the relative risk
lung_new$rel_risk <- predict(cox_mod, lung_new,  type= "risk")

Predicting the relative risk with the formula (see the post mentioned above)

# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2

# run a cox regression with the predictor meal.cal, its quadratic version and some covariates.
cox_mod <- coxph(Surv(time, status) ~
               ph.karno + pat.karno + meal.cal + meal.cal_q,
             data = lung)

# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE), max(lung$meal.cal, 
                                                     na.rm= TRUE), by= 1)

# a vector of fitted values to predict for, the quadratic effect
meal.cal_q_new <- meal.cal_new^2

# length of the vector to predict for
n <- length(meal.cal_new)

# A dataframe with the values to make the prediction for
lung_new2 <- data.frame(
             ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n), 
             pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n), 
             meal.cal= meal.cal_new, 
             meal.cal_q = meal.cal_q_new)

# A dataframe with the values to compare the prediction with
lung_new_mean <- data.frame(
                 ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n), 
                 pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n), 
                 meal.cal= rep(mean(lung$meal.cal, na.rm= TRUE), n), 
                 meal.cal_q = rep(mean(lung$meal.cal_q, na.rm= TRUE), n))

# extract the coefficients
coefCPH <- coef(cox_mod)

# make the prediction for the values of interest
cox_risk <-
exp(coefCPH["ph.karno"] * lung_new2[ , "ph.karno"] +
    coefCPH["pat.karno"] * lung_new2[ , "pat.karno"] +
    coefCPH["meal.cal"] * lung_new2[ , "meal.cal"] +
    coefCPH["meal.cal_q"] * lung_new2[ , "meal.cal_q"])

# make the predictions for the values to compare with
cox_risk_mean <-
exp(coefCPH["ph.karno"] * lung_new_mean[ , "ph.karno"] +
    coefCPH["pat.karno"] * lung_new_mean[ , "pat.karno"] +
    coefCPH["meal.cal"] * lung_new_mean[ , "meal.cal"] +
    coefCPH["meal.cal_q"] * lung_new_mean[ , "meal.cal_q"])

# calculate the relative risk
lung_new2$rel_risk <- unlist(cox_risk)/ unlist(cox_risk_mean)

Now the plot with the predicted relative risk using predict() and using the formula:

ggplot(lung_new, aes(meal.cal, rel_risk)) +
       geom_smooth() +
       geom_smooth(data= lung_new2, col= "red")

The plot shows that the predictions are different. I do not understand why this is the case, although the mentioned post shows that the predict function and the formula should give the same result.

Because of this confusion I tried to solve the problem with the simPH package. Here is what I did:

# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2

# run a cox regression with the predictor, its quadratic version and some covariates.

cox_mod <- coxph(Surv(time, status) ~
                 ph.karno + pat.karno + meal.cal + meal.cal_q,
                 data = lung)

# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE),
                    max(lung$meal.cal, na.rm= TRUE), by= 1)

# length of the vector to predict for
n <- length(meal.cal_new)

# A vector with the values to compare the prediction with
meal.cal_new_mean <- rep(mean(lung$meal.cal, na.rm= TRUE), n)

# running 100 simulations per predictor value with coxsimPoly
Sim <- coxsimPoly(obj= cox_mod, b = "meal.cal", pow = 2,
                  qi = "Relative Hazard",
                  Xj = meal.cal_new,
                  Xl = meal.cal_new_mean,
                  ci = .95,
                  nsim = 100,
                  extremesDrop = FALSE)

# plot the result
simGG(Sim)

This gives an empty plot with the warning

Warning messages:
1: In min(obj$sims[, x]) : no non-missing arguments to min; returning Inf
2: In max(obj$sims[, x]) : no non-missing arguments to max; returning -Inf

And the Sim$simsobject appears indeed to be empty.

My questions are:

  1. Why do the results from predict() and the use of the formula differ?

  2. Why does the simPH package not calculate the relative risk?

  3. Which method should I choose? My hypothesis is a quadratic effect in a cox regression and I need a plot for this predictor with its relative risk (compared to the predictor being at its mean value), just like in the example.

Upvotes: 1

Views: 766

Answers (1)

Christopher Gandrud
Christopher Gandrud

Reputation: 88

Quick answer to the simPH issue: the polynomial terms need to be specified in the coxph call using the I function, e.g.:

cox_mod <- coxph(Surv(time, status) ~
                 ph.karno + pat.karno + meal.cal + I(meal.cal^2),
             data = lung)

(The error handling in your use case is pretty poor.)

Using this modification (and 1000 simulations) with your code above should return something like:

enter image description here

Differences between simPH and predict

My guess to the differences is that simPH doesn't create confidence intervals around the transformed point estimates like predict. It draws simulations from the multivariate normal distribution specified by the fitted model, then shows the central 50% and 95% of this simulated distribution. The central line is just the median of the sims. It is explicitly a different logic from predict. For very non-monotonic quantities of interest, like this one, predict point estimates give highly substantively misleading results compared to simPH. There is little evidence for such a form based on 4 observations.

Upvotes: 1

Related Questions