Reputation: 13
I know that the title doesn't specify exactly what I mean so let me explain it here. I working on a dataset that consists of yield of wheat given a certain wheat type (A,B,C,D). Now my issue when fitting linear model is that I'm trying to fit:
lm1 = lm(yield ~ type)
, when doing so R commits the first wheat type(A) and marks it as a global intercept and then estimates influence of all other types on the yield.
I know that I can fit a linear model like such:
lm2 = lm(yield ~ 0 + type)
which will give me estimates of the influence of each type on the yield however what I really want to see is a sort of combination of the two of them.
Is there an option to fit a linear model in R s.t
lm3 = lm(yield ~ GlobalIntercept + type)
where GlobalIntercept would represent the general intercept of my linear model and then I could see the influence of each type of wheat on that general intercept. So kind of like in the first model though this time we'd estimate the influence of all types of wheat (A,B,C,D) on the general yield.
Upvotes: 1
Views: 384
Reputation: 269586
Questions to SO should include minimal reproducible example data -- see instructions at the top of the r tag page. Since the question did not include this we will provide it this time by using the built-in InsectSprays
data set that comes with R.
Here are a few approaches:
1) lm/contr.sum/dummy.coef Try using contr.sum
sum-to-zero contrasts for the spray
factor and look at the dummy coefficients. That will expand the coefficients to include all 6 levels of the spray
factor in this example:
fm <- lm(count ~ spray, InsectSprays, contrasts = list(spray = contr.sum))
dummy.coef(fm)
## Full coefficients are
##
## (Intercept): 9.5
## spray: A B C D E F
## 5.000000 5.833333 -7.416667 -4.583333 -6.000000 7.166667
sum(dummy.coef(fm)$spray) # check that coefs sum to zero
## [1] 0
2) tapply If each level has the same number of rows in the data set such as is the case with InsectSprays
where each level has 12 rows then we can take the mean for each level and then subtract the Intercept (which is the overall mean). This does not work if the data set is unbalanced, i.e. if the different levels have different numbers of rows. Note how the calculations below give the same result as (1).
mean(InsectSprays$count) # intercept
## [1] 9.5
with(InsectSprays, tapply(count, spray, mean) - mean(count))
## A B C D E F
## 5.000000 5.833333 -7.416667 -4.583333 -6.000000 7.166667
3) aov/model.tables We can also use aov
with model.tables
like this:
fm2 <- aov(count ~ spray, InsectSprays)
model.tables(fm2)
## Tables of effects
##
## spray
## spray
## A B C D E F
## 5.000 5.833 -7.417 -4.583 -6.000 7.167
model.tables(fm2, type = "means")
## Tables of means
## Grand mean
##
## 9.5
##
## spray
## spray
## A B C D E F
## 14.500 15.333 2.083 4.917 3.500 16.667
4) emmeans We can use lm followed by emmeans like this:
library(emmeans)
fm <- lm(count ~ spray, InsectSprays)
emmeans(fm, "spray")
## spray emmean SE df lower.CL upper.CL
## A 14.50 1.13 66 12.240 16.76
## B 15.33 1.13 66 13.073 17.59
## C 2.08 1.13 66 -0.177 4.34
## D 4.92 1.13 66 2.656 7.18
## E 3.50 1.13 66 1.240 5.76
## F 16.67 1.13 66 14.406 18.93
##
## Confidence level used: 0.95
Upvotes: 3
Reputation: 2949
As per the information provided by you, I could infer that you are modeling the yield as a linear function of type which has four categories. Your expectation is to have an intercept apart from the coefficients of each of the types. This doesn't make sense.
You are predicting the yield based on nominal variable. If you want to have regression with intercept, you need to have the predictor variable with origin. The property of a nominal variable is that it doesn't have origin. The origin means that the zero value for the predictor. A nominal variable cannot have an origin. In other words, the intercept (with a continuous predictor variable) means the value of the dependent variable y, when the predictor value is zero (in your case, the category of the type is zero which is practically impossible). That is why your model takes one of the categories as a reference category and calculates the intercept for it. The changes in the y variable when the category is different than the reference category is given by the coefficients.
Upvotes: 1