tom91
tom91

Reputation: 685

Dummy variables in R

I'm constructing a linear model to evaluate the effect of distances from a habitat boundary on the richness of an order of insects. There are some differences in equipment used so I am including equipment as a categorical variable to ensure that it hasn't had a significant affect on richness.

The categorical factor is 3 leveled so I asked r to produced dummy variables in the lm by using the code:

lm(Richness ~ Distances + factor(Equipment), data = Data) 

When I ask for the summary of the model I can see two of the levels with their coefficients. I am assuming that this means r is using one of the levels as the "standard" to compare the coefficients of the other levels to.

How can I find the coefficient for the third level in order to see what effect it has on the model?

Thank you

Upvotes: 1

Views: 3002

Answers (2)

bkielstr
bkielstr

Reputation: 419

To determine how to extract your coefficient, here is a simple example:

# load data 
data(mtcars)
head(mtcars)

# what are the means of wt given the factor carb?
(means <- with(mtcars, tapply(wt, factor(carb), mean)))

# run the lm
mod <- with(mtcars, lm(wt~factor(carb)))

# extract the coefficients
coef(mod)

# the intercept is the reference level (i.e., carb 1)
coef(mod)[1]
coef(mod)[2:6]
coef(mod)[1] + coef(mod)[2:6]
means

So you can see that the coefficients are simply added to the reference level (i.e., intercept) in this simple case. However, if you have a covariate, it gets more complicated

mod2 <- lm(wt ~ factor(carb) + disp, data=mtcars)
summary(mod2)

The intercept is now the carb 1 when disp = 0.

Upvotes: 1

rbatt
rbatt

Reputation: 4797

You can do lm(y~x-1) to remove the intercept, which in your case is the reference level of one of the factors. That being said, there are statistical reasons for using one of the levels as a reference.

Upvotes: 2

Related Questions