Get all Coefficients Belonging to a Factor

Question

I want to automatically determine which coefficients in a lm belong to a factor. So assume that I have the following models:

d <- data.frame(a = gl(4, 2, 16), b = gl(2, 1, 16), 
                x = runif(16), y = runif(16), Y = runif(16))
l1 <- lm(Y ~ a + b + x + y, data = d)
l2 <- lm(Y ~ x + y, data = d)

Then the names of the coefficients of the first model are as follows:

names(coef(l1))
# [1] "(Intercept)" "a2"          "a3"          "a4"          "b2"         
# [6] "x"           "y"

Now ideally I would like a function which tells me that a2, a3, a4and b2 are coefficients of dummy-coded factors.

For a model which does not contain any factors (like l2), the output should be NULL.

I had a look at str(l1) and I saw that there is (in case of the presence of factors in the model) a slot xlevels. I could use names(l1$xlevels) to get a list of all factors in the model and then use grep on the names of the coefficients:

names(coef(l1))[unlist(sapply(names(l1$xlevels), function(.) grep(., names(coef(l1)))))]
# [1] "a2" "a3" "a4" "b2"

But this seems to me like a very dirty work-around and won't work as soon as I have similar names in my model:

d$a4 <- runif(16)
l3 <- update(l1, . ~ . + a4, data = d)
names(coef(l3))[unlist(sapply(names(l3$xlevels), function(.) grep(., names(coef(l3)))))]
# [1] "a2" "a3" "a4" "a4" "b2"

Also, a change of the default contrasts will change the name of the dummy coefficients in my models, so even the most elaborate strategy working on the names of the coefficients probably won't work.

Long story short: how do I get a list of all coefficients which belong to a factor?

thothal · Accepted Answer

After the fruitful discussion in the comments, I finally came up with this solution. Please note that I slightly changed the desired outcome to return not only coefficients assigned to factors, but also to differentiate whether those belong to a factor main effect, a factor-factor interaction or a factor-variable interaction. I included all the use cases out of the discussion and the output is as expected a proper characterization of the coefficients.

getCoefficientType <- function(mod) {
   INTCPT <- "(Intercept)"
   te <- mod$terms
   hasIntercept <- attr(te, "intercept") == 1
   ## factor terms
   predictors <- attr(te, "dataClasses")
   factors <- names(predictors[predictors == "factor"])
   if (hasIntercept) {
      termLabels <- c(INTCPT, attr(te, "term.labels"))
   } else {
      termLabels <- attr(te, "term.labels")
   }
   ## - loop through all terms in the model
   ## - split interactions at ":" into atoms
   ## - check if any of the atoms occurs in the factor list
   types <- sapply(strsplit(termLabels, ":"), function(x) {
      ind <- x %in% factors
      if (length(x) == 1) {
         if (x == INTCPT) {
            "intercept"
         } else if (ind) {
            "factor.main"
         } else {
            "variable.main"
         }
      } else {
         if (all(ind)) {
            "factor.factor.interaction"
         } else if (!any(ind)) {
            "variable.variable.interaction"
         } else {
            "factor.variable.interaction"
         }
      }
   })
   setNames(rep(types, rle(mod$assign)$length), names(coef(mod)))
}

d <- data.frame(a = gl(4, 2, 16), b = gl(2, 1, 16), 
                x = runif(16), y = runif(16), Y = runif(16), a4 = runif(16))
l1 <- lm(Y ~ a + b + x + y, data = d)
l2 <- lm(Y ~ x + y, data = d)
l3 <- update(l1, . ~ . + a4, data = d)
l4 <- update(l3, contrasts = list(a = "contr.poly"))
l5 <- update(l2, . ~ . + a:x + x:y)
l6 <- update(l5, . ~ . - 1)
getCoefficientType(l1)
getCoefficientType(l2)
getCoefficientType(l3)
getCoefficientType(l4)
getCoefficientType(l5)
getCoefficientType(l6)

Get all Coefficients Belonging to a Factor

Answers (2)

Related Questions