Coefficients for all levels of a categorical factor unchanged in lmer after adding variables

Question

I am estimating lmer models with subject random effects for a within-subject design study. I have a measurement of a dependent variable for each subject in three different treatment conditions, resulting in a balanced design. In addition to the treatment dummies I also have control variables in the lmer model.

First thing that stuck out is that all treatment dummies had equal standard errors, which has already been asked and answered here:

https://stats.stackexchange.com/questions/170018/equal-standard-errors-for-all-levels-of-a-categorical-factor-in-lmer

The second thing that stuck out was that the coefficients of the treatment dummies do not change if I add control variables to the model.

Here the behavior of lmer is reproduced with some simulated data:

library(tidyverse)
library(lme4)
library(lmerTest)

#Some data:
id <- rep(1:50) #subject id
dependent_1 <- rnorm(50,10,5) #dependent measure in treatment 1
dependent_2 <- rnorm(50,18,3) #dependent measure in treatment 2
dependent_3 <- rnorm(50,28,4) #dependent measure in treatment 3
control_a <- rnorm(50, 100, 5) #first control
control_b <- rnorm(50, 200,33) #second control

df <- data.frame(id, dependent_1, dependent_2, dependent_3, control_a, control_b) #make dataframe

#Reshape to long form
df_long <- pivot_longer(df, 
                        cols = starts_with("dependent_"), 
                        names_to = c(".value","treatment"),
                        names_sep = "\_")

#Treatment to factor
df_long$treatment <- as.factor(df_long$treatment)

#LMER Models
lmer_model.1 <- lmer(dependent ~ treatment +(1|id), data = df_long, REML = FALSE) #Model with treatment dummies only
lmer_model.2 <- lmer(dependent ~ treatment + control_a + control_b + (1|id), data = df_long, REML = FALSE) #Model with treatment dummies and controls

I get the following results:

===============================================================
                            Model 1               Model 2             
---------------------------------------------------------------
(Intercept)             9.246 (0.567) ***    17.535 (7.796) *  
treatment2              8.157 (0.787) ***     8.157 (0.787) ***
treatment3             20.030 (0.787) ***    20.030 (0.787) ***
control_a                                    -0.067 (0.072)    
control_b                                    -0.008 (0.011)    
---------------------------------------------------------------
AIC                   852.194               854.977            
BIC                   867.247               876.051            
Log Likelihood       -421.097              -420.488            
Num. obs.             150                   150                
Num. groups: id        50                    50                
Var: id (Intercept)     0.596                 0.457            
Var: Residual          15.492                15.492            
===============================================================
*** p < 0.001; ** p < 0.01; * p < 0.05

Can anyone explain to me the reason why this happens?

benimwolfspelz · Accepted Answer

This is apparently about statistics, not programming. Consider asking it on Cross Validated instead.

It seems that the answer to your question lies in the way you set up your example data. An additional control/predictor variable X2 only influences estimates of another, previously included predictor X1 they are (at least a bit) correlated. In reality, this is mostly true, as you hardly get r = .00 correlations in real-life data. But you set up your data in a way such that treatment is orthogonal to control_a and control_b. Therefore, including either control does not impact the coefficients of the treatment dummies.

Coefficients for all levels of a categorical factor unchanged in lmer after adding variables

Answers (1)

Related Questions