Ojaljohn
Ojaljohn

Reputation: 13

extract coefficients of a factor from lm object using names

I have fitted a lm as follows:

data <- data.frame(x=rnorm(50), x2=runif(50), y=rnorm(50), g=rep(1:3,length.out=50))
model <- lm(y ~ x + x2 + factor(g), data=data)

I want to extract the coefficients of each of the levels of the factor variable by refering to them using names for instance the way I would do with a continuous variable like 'x':

model$coefficients["x"]

I have tried using:

> model$coefficients["g"]
<NA> 
  NA 

But it fails since the levels are renamed as can be observed below:

> model$coefficients
(Intercept)           x          x2  factor(g)2  factor(g)3 
 0.60058881  0.01232678 -0.65508242 -0.25919674 -0.04841089

I have also tried using the displayed names using:

model$coefficients["factor(g)2"]

but it doesn't work. How can i get this right?

Many thanks.

Upvotes: 1

Views: 2703

Answers (1)

Max Gordon
Max Gordon

Reputation: 5457

I always try to use the coef() function together with the grep() in these cases, I would do something like this:

data <- data.frame(x=rnorm(50), x2=runif(50), y=rnorm(50), g=rep(1:3,length.out=50))
model <- lm(y ~ x + x2 + factor(g), data=data)
estimates <- coef(model)
# Just get the g:2
estimates[grep("^factor\\(g\\)2", names(estimates))]

# If you want to get both factors you just skip the 2
estimates[grep("^factor\\(g\\)", names(estimates))]

# This case does not really require fancy 
# regular expressions so you could write
estimates[grep("factor(g)", names(estimates), fixed=TRUE)]

# This comes much more in handy when you have a more complex situtation where
# coefficients have similar names
data <- data.frame(x=rnorm(50), great_g_var=runif(50), y=rnorm(50),
                   g_var=factor(rep(1:3,length.out=50)),
                   g_var2=factor(sample(1:3,size=50, replace=TRUE)))

model <- lm(y ~ x + great_g_var + g_var + g_var2, data=data)
estimates <- coef(model)

# Now if you want to do a simple fixed grep you could end up
# with unexpected estimates
estimates[grep("g_var", names(estimates), fixed=TRUE)]

# Returns:
# great_g_var       g_var2       g_var3      g_var22      g_var23 
# -0.361707955 -0.058988495  0.010967326 -0.008952616 -0.297461520 

# Therefore you may want to use regular expressions, here's how you select g_var
estimates[grep("^g_var[0-9]$", names(estimates))]

# Returns:
# g_var2      g_var3 
# -0.05898849  0.01096733 

# And if you want to have g_var2 you write:
estimates[grep("^g_var2[0-9]$", names(estimates))]

# Returns:
# g_var22      g_var23 
# -0.008952616 -0.297461520 

Upvotes: 3

Related Questions