poulpe
poulpe

Reputation: 13

How to code categorical variable with three levels for Cox with random effects model?

I want to estimate regression parameters of a Cox random effects model. Let us say that I have a categorical variable with two levels, sex for example. Then coding the variable is straightforward: 0 if male and 1 if female for example. The interpretation of the regression coefficient associated to that variable is simple.

Now let us say that I have a categorical variable with three levels. If I just code the variable with 0,1,2 for the three levels (A,B and C), the estimation of the associated regression coefficient would not be what I am looking for. If I want to estimate the risks associated with each "level" wrt the other levels, how should I code the variable ?

What I have done so far:

I define three variables. I define one variable where I code level A as 1 and the rest (levels B and C) as 0. I define another variable where I code level B as 1 and the rest (levels A and C) as 0. Finally, I define a variable where I code level C as 1 and the rest (levels A and B) as 0.

I then estimate the three regression parameters assocaited to the variables.

Just to be clear, I do not want to use any package such as coxph, coxme, survival, etc.

Is there an easier way to to this ?

Upvotes: 1

Views: 1440

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226332

Your description (one predictor variable that is all ones, the other two predictor variables as indicator variables for groups B and C) is exactly recapitulating the standard treatment contrasts that R uses.

If you want to construct a model matrix with treatment contrasts for a single factor f (inside a data frame d), then model.matrix(~f, data=d) will work

d <- data.frame(f=factor(c("A","B","B","C","A")))
model.matrix(~f, data=d)

Results:

  (Intercept) fB fC
1           1  0  0
2           1  1  0
3           1  1  0
4           1  0  1
5           1  0  0
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$f
[1] "contr.treatment"

You can use other contrasts if you like; these will change the parameter estimates (and interpretation!) for your individual variables, but not the overall model fit. e.g.

model.matrix(~f , data=d, contrasts=list(f=contr.sum))

Upvotes: 0

Related Questions