WannabeGandalf
WannabeGandalf

Reputation: 99

How to get value of group = 0 in linear mixed model

I have a very simple stat question probably. So, I am fitting linear mixed models like this: lme(dependent ~ Group + Sex + Age + npgs, data=boookclub, random = ~ 1| subject) Group is a factor variable with levels = 0, 1 , 2 , 3 The dependent are continuous variables standardized (mean 0) and the others are covariates with sex being factor, with Male/Female levels, Age being numerical, and npgs being numerical continuous standardized as well. When I get the table with beta, standard error, t and p values, I get this:

                    Value  Std.Error  DF   t-value p-value
(Intercept)   -0.04550502 0.02933385 187 -1.551280  0.0025
Group1         0.04219801 0.03536929 181  1.193069  0.2344
Group2         0.03350827 0.03705896 181  0.904188  0.3671
Group3         0.00192119 0.03012654 181  0.063771  0.9492
SexMale        0.03866387 0.05012901 181  0.771287  0.4415
Age           -0.00011675 0.00148684 181 -0.078520  0.9375
npgs           0.15308844 0.01637163 181  9.350835  0.0000
SexMale:Age    0.00492966 0.00276117 181  1.785352  0.0759 

My problem is: how do I get the beta of Group0? In this case the intercept is Group0 but also the average of npgs, being npgs standardized. How do I get the Beta of Group0? And how can I check if Group0 is significantly associated to the dependent? I'd like to see the effect of all Group levels.

Thanks

Upvotes: 1

Views: 266

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226522

The easiest way to do what you want may be with the emmeans package, but you may also have some conceptual issues. Technical details first, then conceptual:

Technical

Fitting an example (this isn't necessarily statistically sensible, but I wanted an example with a categorical fixed effect)

library(nlme)
m1 <- lme(Yield~Variety, random = ~1|Block, data=Alfalfa)

As with your example, the effects are "intercept" (= mean of the baseline group, which is the "Cossack" variety in this case [by default, the alphabetically-first group]), "Ladak" (difference between Ladak and Cossack means) and "Ranger" (similarly). (As @Ben hints in the comments above, R automatically generates dummies for [most of] the levels of the categorical variables [factors] in your model.)

coef(summary(m1))

##                     Value  Std.Error DF    t-value      p-value
## (Intercept)    1.57166667 0.11665326 64 13.4729767 2.373343e-20
## VarietyLadak   0.09458333 0.07900687 64  1.1971532 2.356624e-01
## VarietyRanger -0.01916667 0.07900687 64 -0.2425949 8.090950e-01

The emmeans package is a convenient way to see predicted values for each group without recoding.

library(emmeans)
emmeans(m1, spec = ~Variety)
##  Variety emmean    SE df lower.CL upper.CL
##  Cossack   1.57 0.117  5     1.27     1.87
##  Ladak     1.67 0.117  5     1.37     1.97
##  Ranger    1.55 0.117  5     1.25     1.85

Conceptual

You can't "check if Group0 is significantly associated with the dependent [response] variable". You can only check whether the response variables differs significantly between two groups, or whether it differs significantly among all groups (e.g. the results of anova()). You have to pick a baseline. (If you insist, you can test all pairwise comparisons among groups; emmeans can help with this too.) If you "remove the intercept" (by fitting Variety ~Yield-1, or by looking at the results that emmeans produces) then the difference you are quantifying is the difference between the mean of a particular group and zero. This is usually not a meaningful question; in the example here, for instance, this would be testing whether a wheat variety gave a yield that was significantly greater than zero — probably not very interesting.

On the other hand, if you are just interested in estimating the expected value in each group (conditioning on the baseline values of the other variables in the model), along with the standard errors/CIs, then the answers you get from emmeans are perfectly sensible.

There's a related question here that explains why you get an NA value if you manually create dummies for every level of your factor ...

Upvotes: 4

Related Questions