Matifou
Matifou

Reputation: 8920

R lm formula: estimate difference between groups along two factors?

I would like to estimate differences between groups/treatments along multiple categories using lm in R. Say I have categories A, B and C and groups control/treated, I would like to estimate the difference in means between treated and control in category A, B, C, i.e. A_control - A_treat, etc

I am able to estimate the individual means with y~Categ:Group -1, but when trying to estimate the difference in means with y~Categ*Group -1, I get the right difference only for A_control - A_treat, while the other difference coefficients are expressed as A_C - A_T - (B_C-B_T). How can I use the formula in lm or format the factors so that I get direct differences? Alternatively, I suspect this might be close to a anova-derivative analysis, so there might be a simpler approach to it?

In the example below, group means difference should be (on average) 0.5 for A, and 0 for B and C.

## create data, group A has a differnce of 0.5 between treated and control
Categ <- rep(LETTERS[1:3], times=400)
Group <- rep(c("Control", "Treat"), each = 600)

set.seed(123)
y <- as.numeric(as.factor(Categ))+
  rnorm(length(Group)) +
  ifelse(Categ=="A" & Group =="Treat",rnorm(mean=0.5, length(Group)),0)

df <- data.frame(Categ, Group, y)

## estimate full coefficients
reg_full <- lm(y~Categ:Group -1, data=df)
coef(reg_full)
#> CategA:GroupControl CategB:GroupControl CategC:GroupControl   CategA:GroupTreat 
#>            1.002340            1.976993            3.085993            1.363267 
#>   CategB:GroupTreat   CategC:GroupTreat 
#>            2.039785            3.016188

## estimate difference between treated/control for each category
reg_diff <- lm(y~Categ*Group -1, data=df)
coef(reg_diff)
#>            CategA            CategB            CategC        GroupTreat 
#>         1.0023399         1.9769931         3.0859931         0.3609268 
#> CategB:GroupTreat CategC:GroupTreat 
#>        -0.2981345        -0.4307316

## Want to have:
coef(reg_full)[4:6]-coef(reg_full)[1:3]
#> CategA:GroupTreat CategB:GroupTreat CategC:GroupTreat 
#>         0.3609268         0.0627923        -0.0698048
 
## actually have:
coef(reg_diff)[4:6]
#>        GroupTreat CategB:GroupTreat CategC:GroupTreat 
#>         0.3609268        -0.2981345        -0.4307316

## which corresponds to:
c(coef(reg_full)[4]-coef(reg_full)[1],
  coef(reg_full)[1]-coef(reg_full)[4] - (coef(reg_full)[2:3]-coef(reg_full)[5:6]))
#>   CategA:GroupTreat CategB:GroupControl CategC:GroupControl 
#>           0.3609268          -0.2981345          -0.4307316

Created on 2022-10-16 by the reprex package (v2.0.1)

Upvotes: 2

Views: 192

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 270045

Try TukeyHSD like this

TukeyHSD(aov(reg_full))

giving:

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = reg_full)

$`Categ:Group`
                          diff         lwr        upr     p adj
B:Control-A:Control  0.9746533  0.66967220  1.2796344 0.0000000
C:Control-A:Control  2.0836533  1.77867221  2.3886344 0.0000000
A:Treat-A:Control    0.3609268  0.05594575  0.6659079 0.0097924
B:Treat-A:Control    1.0374456  0.73246449  1.3424267 0.0000000
C:Treat-A:Control    2.0138485  1.70886741  2.3188296 0.0000000
C:Control-B:Control  1.1090000  0.80401893  1.4139811 0.0000000
A:Treat-B:Control   -0.6137264 -0.91870753 -0.3087454 0.0000002
B:Treat-B:Control    0.0627923 -0.24218879  0.3677734 0.9918603
C:Treat-B:Control    1.0391952  0.73421413  1.3441763 0.0000000
A:Treat-C:Control   -1.7227265 -2.02770754 -1.4177454 0.0000000
B:Treat-C:Control   -1.0462077 -1.35118880 -0.7412266 0.0000000
C:Treat-C:Control   -0.0698048 -0.37478588  0.2351763 0.9867776
B:Treat-A:Treat      0.6765187  0.37153766  0.9814998 0.0000000
C:Treat-A:Treat      1.6529217  1.34794057  1.9579027 0.0000000
C:Treat-B:Treat      0.9764029  0.67142184  1.2813840 0.0000000

Upvotes: 1

Related Questions