Reputation: 349
This question is about what happens to the Hazard Ratio reference level in a Cox PH regression using factors, when the default reference level gets subsetted out, and specifically if that behaviour can be changed within the coxph subset operation.
The application is Cause Specific Hazards, so I'd like the flexibility to analyse multiple subsets of the variables - one for each of the competing risks - without creating multiple instances of the All Cause data set. (Note: this is exploratory, not testing a hypothesis.)
Define a contrived df that behaves reasonably (e.g. statistically significant, not too many warnings)...
smpls = 50
df <- data.frame(time=c(sample.int(10,smpls, replace=TRUE),
sample.int(20,smpls, replace=TRUE),
sample.int(30,smpls, replace=TRUE)),
status=1,
x=as.factor(c(rep("A",smpls),
rep("B",smpls),
rep("C",smpls))))
Load the relevant library...
require(survival)
Do a base case which outputs HR coefficients for B and C, with factor A as the reference...
coxph(Surv(time, status) ~ x, df)
Then subset out factor A. This seems to have the effect of selecting factor C as the reference for the coefficient for factor B.
coxph(Surv(time, status) ~ x, df, subset=x!="A")
In that last example, how would I 'force' the reference to be B instead of C?
Upvotes: 3
Views: 2498
Reputation: 93761
One option is to use mutate
from the dplyr
package, which allows you to modify the data frame on the fly:
library(dplyr)
Keep all three levels, but set reference level to B:
coxph(Surv(time, status) ~ x, data = mutate(df, x = relevel(x, ref="B")))
Get rid of level A and set reference level to B: We also use droplevels
here, so that the factor level A is not only removed from the data frame, but also
dropped as a possible level for x
. You don't have to call droplevels
but then you get a warning and the summary output will have a row of missing values for the A
level.
coxph(Surv(time, status) ~ x,
data = df %>%
filter(x != "A") %>%
mutate(x = droplevels(relevel(x, ref="B"))))
Upvotes: 3