user14212134
user14212134

Reputation:

Change order of categorical variable and reference category using lm

I have an unordered categorical variable (event_time) with 5 different options ("future", "past", "prebirth", "never", "uncertain") as a predictor variable, and I want to specify somehow to make "never" the reference category (ideally without transforming the variable). I'm just using lm and then texreg::screenreg(list(m1, m2, m3) to compare output for models with different outcome variables but this same predictor.

If there's a way to to rearrange the order that the categories show up in the model (perhaps within screenreg?) that'd be wonderful.

And an added bonus if this can all be done without dealing with transforming and factor variables (I know how to do this with relevel if the variable was a factor already)...thanks much.

Some data:

structure(list(yvar = c(4.43024525984776, -3.01051231657988, 
4.70993862460106, -2.03636967067474, -1.09802960848352, -1.16527740798651, 
5.6002805983151, -7.03524067599639, 1.02474010023752, 0.647438645180132
), event_time = c(NA, "Pre", "Future", "Time unknown", "Future", "Future", NA, 
"Never", NA, "Never"), race = c("Black", "Black", "White", "Black", 
"Black", "Black", "Black", "White", "Black", "White"), log_parent_income = c(4.0073333, 
NA, 3.8066626, 2.1972246, 0.69314718, 4.2484951, 3.9120231, 1.9459101, 
2.3025851, 3.8066626)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

And then just doing a simple lm(yvar ~ event_time + log_parent_income + race ... model.

Upvotes: 3

Views: 3745

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226961

I don't know if this will make you happy or not, but here goes.

A helper function that will be useful for reordering:

match_pattern <- function(regex, target) {
    sapply(regex, function(x) {
        g <- grep(x,target)
        if (length(g)==0) return(NA)
        if (length(g)>1) stop("multiple matches")
        return(g)
    })
}

Fit the model. Here I'm using the forcats package because fct_relevel is less fussy about accepting a character vector (i.e. I don't need relevel(factor(event_time), "Never").

m1 <- lm(yvar~event_time,
         data=transform(dd, event_time=forcats::fct_relevel(event_time,"Never")))

If you like the tidyverse you can make it slightly more compact:

dd %>% mutate(across(event_time, ~fct_relevel(.,"Never"))) %>%
     lm(formula=yvar~event_time)

Now texreg::screenreg(m1) will actually output the coefficients in your preferred order ("Future", "Pre", "Time unknown") because it happens to be alphabetical. If you wanted to change the order to something else you could:

ref_order <-  c("(Intercept)", "Time unknown", "Future", "Pre")
pp <- match_pattern(ref_order,names(coef(m1)))
texreg::screenreg(m1, reorder.coef=pp)

While it would theoretically be possible to do what you want without touching the data set (by setting up a custom contrast), I think it would be considerably harder. In the long run trying to work in a language without adopting its idioms can be tough — you might try figuring out what you don't like about factors and trying to address it (the forcats package can be helpful for some tasks).

Upvotes: 0

Onyambu
Onyambu

Reputation: 79348

In base R, you can change the contrasts directly on the linear regression

 lm(yvar ~ C(event_time, base = 2)+ log_parent_income + race, data = df)

That is if you knew the base you want.

If you know that the reference level is the last one, then you can do:

 lm(yvar ~ event_time + log_parent_income + race, data = df, 
     contrasts = list(event_time = "contr.SAS"))

Of course this means that if you were to do the same for various variables, just change the options:

 options(contrasts = rep("contr.SAS",2))
 lm(yvar ~ event_time + log_parent_income + race, data = df)

This is assuming that Never is the last reference level. You can mess up with the contr.treatment base argument to set the reference to any number that you want

Lastly, you can write up a function that takes in the base argument as a string character:

C1 <- function (object, contr, how.many, ...) 
{
  base <- list(...)$base
  if(!is.null(base) &is.character(base))
      base <- match(base, levels(factor(object)))
  C(object, base = base)
}

Then you could use it as:

lm(yvar~C1(event_time, base = "Never"), df)

Is that not enough? You could change the contrasts argument by providing a function too. With this, the names will be maintained I believe

Upvotes: 2

Related Questions