Reputation: 8920
I would like to estimate differences between groups/treatments along multiple categories using lm
in R. Say I have categories A, B and C and groups control/treated, I would like to estimate the difference in means between treated and control in category A, B, C, i.e. A_control - A_treat
, etc
I am able to estimate the individual means with y~Categ:Group -1
, but when trying to estimate the difference in means with y~Categ*Group -1
, I get the right difference only for A_control - A_treat
, while the other difference coefficients are expressed as A_C - A_T - (B_C-B_T)
. How can I use the formula in lm
or format the factors so that I get direct differences? Alternatively, I suspect this might be close to a anova-derivative analysis, so there might be a simpler approach to it?
In the example below, group means difference should be (on average) 0.5 for A, and 0 for B and C.
## create data, group A has a differnce of 0.5 between treated and control
Categ <- rep(LETTERS[1:3], times=400)
Group <- rep(c("Control", "Treat"), each = 600)
set.seed(123)
y <- as.numeric(as.factor(Categ))+
rnorm(length(Group)) +
ifelse(Categ=="A" & Group =="Treat",rnorm(mean=0.5, length(Group)),0)
df <- data.frame(Categ, Group, y)
## estimate full coefficients
reg_full <- lm(y~Categ:Group -1, data=df)
coef(reg_full)
#> CategA:GroupControl CategB:GroupControl CategC:GroupControl CategA:GroupTreat
#> 1.002340 1.976993 3.085993 1.363267
#> CategB:GroupTreat CategC:GroupTreat
#> 2.039785 3.016188
## estimate difference between treated/control for each category
reg_diff <- lm(y~Categ*Group -1, data=df)
coef(reg_diff)
#> CategA CategB CategC GroupTreat
#> 1.0023399 1.9769931 3.0859931 0.3609268
#> CategB:GroupTreat CategC:GroupTreat
#> -0.2981345 -0.4307316
## Want to have:
coef(reg_full)[4:6]-coef(reg_full)[1:3]
#> CategA:GroupTreat CategB:GroupTreat CategC:GroupTreat
#> 0.3609268 0.0627923 -0.0698048
## actually have:
coef(reg_diff)[4:6]
#> GroupTreat CategB:GroupTreat CategC:GroupTreat
#> 0.3609268 -0.2981345 -0.4307316
## which corresponds to:
c(coef(reg_full)[4]-coef(reg_full)[1],
coef(reg_full)[1]-coef(reg_full)[4] - (coef(reg_full)[2:3]-coef(reg_full)[5:6]))
#> CategA:GroupTreat CategB:GroupControl CategC:GroupControl
#> 0.3609268 -0.2981345 -0.4307316
Created on 2022-10-16 by the reprex package (v2.0.1)
Upvotes: 2
Views: 192
Reputation: 270045
Try TukeyHSD like this
TukeyHSD(aov(reg_full))
giving:
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = reg_full)
$`Categ:Group`
diff lwr upr p adj
B:Control-A:Control 0.9746533 0.66967220 1.2796344 0.0000000
C:Control-A:Control 2.0836533 1.77867221 2.3886344 0.0000000
A:Treat-A:Control 0.3609268 0.05594575 0.6659079 0.0097924
B:Treat-A:Control 1.0374456 0.73246449 1.3424267 0.0000000
C:Treat-A:Control 2.0138485 1.70886741 2.3188296 0.0000000
C:Control-B:Control 1.1090000 0.80401893 1.4139811 0.0000000
A:Treat-B:Control -0.6137264 -0.91870753 -0.3087454 0.0000002
B:Treat-B:Control 0.0627923 -0.24218879 0.3677734 0.9918603
C:Treat-B:Control 1.0391952 0.73421413 1.3441763 0.0000000
A:Treat-C:Control -1.7227265 -2.02770754 -1.4177454 0.0000000
B:Treat-C:Control -1.0462077 -1.35118880 -0.7412266 0.0000000
C:Treat-C:Control -0.0698048 -0.37478588 0.2351763 0.9867776
B:Treat-A:Treat 0.6765187 0.37153766 0.9814998 0.0000000
C:Treat-A:Treat 1.6529217 1.34794057 1.9579027 0.0000000
C:Treat-B:Treat 0.9764029 0.67142184 1.2813840 0.0000000
Upvotes: 1