Reputation: 8863
Suppose I fit a model in R like this:
model = glm(y ~ x + language, family = binomial, data = data)
language
is a factor variable; the idea is that there's a different intercept for each language.
Here are the model
coefficients:
> coef(model)
(Intercept) x languageen-GB languageen-US languageja languageko
-17.919438297 0.003119914 -0.427067341 -0.613194669 1.406719444 2.402191148
languagezh
0.894899827
One level of the language
factor (de
) has been chosen as a baseline, and (Intercept)
gives the intercept for that baseline. languageen-GB
, etc., give intercepts as deltas from the baseline intercept.
This code
coeffs = coef(model)
intercepts = c("baseline" = 0, tail(coeffs, -2)) + coeffs["(Intercept)"]
names(intercepts) <- levels(data$language)
intercepts
pulls out the actual intercepts for each factor level:
de en-GB en-US ja ko zh
-17.91944 -18.34651 -18.53263 -16.51272 -15.51725 -17.02454
But it's horrendous code. There must be a nicer way of doing this with model methods or package functions... ?
Edit: one particularly unpleasant part is that the tail(coeffs, -2)
will break if you change the formula. I suppose some kind of string search could be used here instead.
Upvotes: 1
Views: 57
Reputation: 76663
One way of having no baseline factor level is to fit a model with no intercept. This is done with a formula like y ~ 0 + x + .
or by adding -1
instead of 0
.
model2 <- glm(y ~ 0 + ., data, family = binomial)
intercepts2 <- coef(model2)[-1]
names(intercepts2) <- levels(data$language)
intercepts2
# de en-GB en-US
#15.846295 8.696764 6.562384
Now compare with the result posted in the question.
model <- glm(y ~ ., data, family = binomial)
coeffs = coef(model)
intercepts = c("baseline" = 0, tail(coeffs, -2)) + coeffs["(Intercept)"]
names(intercepts) <- levels(data$language)
intercepts
# de en-GB en-US
#15.846295 8.696764 6.562384
all.equal(intercepts, intercepts2)
#[1] TRUE
The results are not identical()
, the computations are made in different ways:
intercepts - intercepts2
# de en-GB en-US
#3.197442e-14 3.907985e-14 3.552714e-14
Data creation code.
I will adapt built in dataset iris
as a data example.
data <- iris[c(1,2,5)]
data$y <- +(data[[1]] < 5.8)
data <- data[-1]
names(data)[c(1,2)] <- c('x', 'language')
i1 <- data[[2]] == "setosa"
i2 <- data[[2]] == "versicolor"
i3 <- data[[2]] == "virginica"
data[[2]] <- as.character(data[[2]])
data[[2]][i1] <- 'de'
data[[2]][i2] <- 'en-GB'
data[[2]][i3] <- 'en-US'
data[[2]] <- factor(data[[2]])
Upvotes: 1