Can I re-define a factor's reference level *within* a subsetted Cox PH?

Question

This question is about what happens to the Hazard Ratio reference level in a Cox PH regression using factors, when the default reference level gets subsetted out, and specifically if that behaviour can be changed within the coxph subset operation.

The application is Cause Specific Hazards, so I'd like the flexibility to analyse multiple subsets of the variables - one for each of the competing risks - without creating multiple instances of the All Cause data set. (Note: this is exploratory, not testing a hypothesis.)

Define a contrived df that behaves reasonably (e.g. statistically significant, not too many warnings)...

smpls = 50
df <- data.frame(time=c(sample.int(10,smpls, replace=TRUE), 
                        sample.int(20,smpls, replace=TRUE), 
                        sample.int(30,smpls, replace=TRUE)),
                 status=1, 
                 x=as.factor(c(rep("A",smpls),
                               rep("B",smpls),
                               rep("C",smpls))))

Load the relevant library...

require(survival)

Do a base case which outputs HR coefficients for B and C, with factor A as the reference...

coxph(Surv(time, status) ~ x, df)

Then subset out factor A. This seems to have the effect of selecting factor C as the reference for the coefficient for factor B.

coxph(Surv(time, status) ~ x, df, subset=x!="A")

In that last example, how would I 'force' the reference to be B instead of C?

eipi10 · Accepted Answer

One option is to use mutate from the dplyr package, which allows you to modify the data frame on the fly:

library(dplyr)

Keep all three levels, but set reference level to B:

coxph(Surv(time, status) ~ x, data = mutate(df, x = relevel(x, ref="B")))

Get rid of level A and set reference level to B: We also use droplevels here, so that the factor level A is not only removed from the data frame, but also dropped as a possible level for x. You don't have to call droplevels but then you get a warning and the summary output will have a row of missing values for the A level.

coxph(Surv(time, status) ~ x, 
  data = df %>% 
    filter(x != "A") %>% 
    mutate(x = droplevels(relevel(x, ref="B"))))

Can I re-define a factor's reference level within a subsetted Cox PH?

Answers (1)

Related Questions

Can I re-define a factor&#39;s reference level *within* a subsetted Cox PH?

Answers (1)

Related Questions

Can I re-define a factor's reference level within a subsetted Cox PH?