Reputation: 20329
I want to automatically determine which coefficients in a lm
belong to a factor. So assume that I have the following models:
d <- data.frame(a = gl(4, 2, 16), b = gl(2, 1, 16),
x = runif(16), y = runif(16), Y = runif(16))
l1 <- lm(Y ~ a + b + x + y, data = d)
l2 <- lm(Y ~ x + y, data = d)
Then the names of the coefficients of the first model are as follows:
names(coef(l1))
# [1] "(Intercept)" "a2" "a3" "a4" "b2"
# [6] "x" "y"
Now ideally I would like a function which tells me that a2, a3, a4
and b2
are coefficients of dummy-coded factors.
For a model which does not contain any factors (like l2
), the output should be NULL
.
I had a look at str(l1)
and I saw that there is (in case of the presence of factors in the model) a slot xlevels
. I could use names(l1$xlevels)
to get a list of all factors in the model and then use grep
on the names of the coefficients:
names(coef(l1))[unlist(sapply(names(l1$xlevels), function(.) grep(., names(coef(l1)))))]
# [1] "a2" "a3" "a4" "b2"
But this seems to me like a very dirty work-around and won't work as soon as I have similar names in my model:
d$a4 <- runif(16)
l3 <- update(l1, . ~ . + a4, data = d)
names(coef(l3))[unlist(sapply(names(l3$xlevels), function(.) grep(., names(coef(l3)))))]
# [1] "a2" "a3" "a4" "a4" "b2"
Also, a change of the default contrasts will change the name of the dummy coefficients in my models, so even the most elaborate strategy working on the names of the coefficients probably won't work.
Long story short: how do I get a list of all coefficients which belong to a factor?
Upvotes: 4
Views: 434
Reputation: 20329
After the fruitful discussion in the comments, I finally came up with this solution. Please note that I slightly changed the desired outcome to return not only coefficients assigned to factors, but also to differentiate whether those belong to a factor main effect, a factor-factor interaction or a factor-variable interaction. I included all the use cases out of the discussion and the output is as expected a proper characterization of the coefficients.
getCoefficientType <- function(mod) {
INTCPT <- "(Intercept)"
te <- mod$terms
hasIntercept <- attr(te, "intercept") == 1
## factor terms
predictors <- attr(te, "dataClasses")
factors <- names(predictors[predictors == "factor"])
if (hasIntercept) {
termLabels <- c(INTCPT, attr(te, "term.labels"))
} else {
termLabels <- attr(te, "term.labels")
}
## - loop through all terms in the model
## - split interactions at ":" into atoms
## - check if any of the atoms occurs in the factor list
types <- sapply(strsplit(termLabels, ":"), function(x) {
ind <- x %in% factors
if (length(x) == 1) {
if (x == INTCPT) {
"intercept"
} else if (ind) {
"factor.main"
} else {
"variable.main"
}
} else {
if (all(ind)) {
"factor.factor.interaction"
} else if (!any(ind)) {
"variable.variable.interaction"
} else {
"factor.variable.interaction"
}
}
})
setNames(rep(types, rle(mod$assign)$length), names(coef(mod)))
}
d <- data.frame(a = gl(4, 2, 16), b = gl(2, 1, 16),
x = runif(16), y = runif(16), Y = runif(16), a4 = runif(16))
l1 <- lm(Y ~ a + b + x + y, data = d)
l2 <- lm(Y ~ x + y, data = d)
l3 <- update(l1, . ~ . + a4, data = d)
l4 <- update(l3, contrasts = list(a = "contr.poly"))
l5 <- update(l2, . ~ . + a:x + x:y)
l6 <- update(l5, . ~ . - 1)
getCoefficientType(l1)
getCoefficientType(l2)
getCoefficientType(l3)
getCoefficientType(l4)
getCoefficientType(l5)
getCoefficientType(l6)
Upvotes: 1
Reputation: 269421
Here are some approaches:
1) This assumes that any column of the model.matrix containing only zeros and ones belongs to a factor (except the intercept). It works for l1
, l2
and l3
, is quite short, does not depend on names (except for the intercept) and does not require fiddling with lm
object components. It works for both main effects and interactions since if the main effects are 0/1 then the interactions will be too. l4
in the comments is an exammple where the 0/1 assumption does not hold.
m <- model.matrix(l1)
all01 <- apply(m == 0 | m == 1, 2, all)
setdiff(names(all01[all01]), "(Intercept)")
## [1] "a2" "a3" "a4" "b2"
2) This does not use names (except for intercept) and works for l1
, l2
and l3
(and l4
in the comments). It does not assume anything about the model matrix but only works for main effects only models. (The no-intercept case is untested.)
cls <- attr(terms(l1), "dataClass")
intercept <- if ("(Intercept)" %in% names(coef(l1))) "" else "+ 0"
fn <- function(nm) names(coef(update(l1, paste(". ~", nm, intercept))))
setdiff(unlist(lapply(names(cls)[cls == "factor"], fn)), "(Intercept)")
Upvotes: 1