Linear model in R doesn't fit properly

Question

I know that the title doesn't specify exactly what I mean so let me explain it here. I working on a dataset that consists of yield of wheat given a certain wheat type (A,B,C,D). Now my issue when fitting linear model is that I'm trying to fit:

lm1 = lm(yield ~ type), when doing so R commits the first wheat type(A) and marks it as a global intercept and then estimates influence of all other types on the yield. I know that I can fit a linear model like such: lm2 = lm(yield ~ 0 + type) which will give me estimates of the influence of each type on the yield however what I really want to see is a sort of combination of the two of them.

Is there an option to fit a linear model in R s.t lm3 = lm(yield ~ GlobalIntercept + type) where GlobalIntercept would represent the general intercept of my linear model and then I could see the influence of each type of wheat on that general intercept. So kind of like in the first model though this time we'd estimate the influence of all types of wheat (A,B,C,D) on the general yield.

G. Grothendieck · Accepted Answer

Questions to SO should include minimal reproducible example data -- see instructions at the top of the r tag page. Since the question did not include this we will provide it this time by using the built-in InsectSprays data set that comes with R.

Here are a few approaches:

1) lm/contr.sum/dummy.coef Try using contr.sum sum-to-zero contrasts for the spray factor and look at the dummy coefficients. That will expand the coefficients to include all 6 levels of the spray factor in this example:

fm <- lm(count ~ spray, InsectSprays, contrasts = list(spray = contr.sum))
dummy.coef(fm)
## Full coefficients are 
##                                                                           
## (Intercept):          9.5                                                  
## spray:                  A         B         C         D         E         F
##                  5.000000  5.833333 -7.416667 -4.583333 -6.000000  7.166667

sum(dummy.coef(fm)$spray)  # check that coefs sum to zero
## [1] 0

2) tapply If each level has the same number of rows in the data set such as is the case with InsectSprays where each level has 12 rows then we can take the mean for each level and then subtract the Intercept (which is the overall mean). This does not work if the data set is unbalanced, i.e. if the different levels have different numbers of rows. Note how the calculations below give the same result as (1).

mean(InsectSprays$count)  # intercept
## [1] 9.5

with(InsectSprays, tapply(count, spray, mean) - mean(count))
##         A         B         C         D         E         F 
##  5.000000  5.833333 -7.416667 -4.583333 -6.000000  7.166667

3) aov/model.tables We can also use aov with model.tables like this:

fm2 <- aov(count ~ spray, InsectSprays)
model.tables(fm2)
## Tables of effects
##
##  spray 
## spray
##      A      B      C      D      E      F 
##  5.000  5.833 -7.417 -4.583 -6.000  7.167 

model.tables(fm2, type = "means")
## Tables of means
## Grand mean
##    
## 9.5 
##
##  spray 
## spray
##      A      B      C      D      E      F 
## 14.500 15.333  2.083  4.917  3.500 16.667

4) emmeans We can use lm followed by emmeans like this:

library(emmeans)

fm <- lm(count ~ spray, InsectSprays)
emmeans(fm, "spray")
##  spray emmean   SE df lower.CL upper.CL
##  A      14.50 1.13 66   12.240    16.76
##  B      15.33 1.13 66   13.073    17.59
##  C       2.08 1.13 66   -0.177     4.34
##  D       4.92 1.13 66    2.656     7.18
##  E       3.50 1.13 66    1.240     5.76
##  F      16.67 1.13 66   14.406    18.93
##
## Confidence level used: 0.95

Linear model in R doesn't fit properly

Answers (2)

Related Questions

Linear model in R doesn&#39;t fit properly

Answers (2)

Related Questions

Linear model in R doesn't fit properly