RNB
RNB

Reputation: 477

How to create difference of difference contrasts with multcomp

I am building a linear model in R and I have specific planned contrasts I am trying to implement. The model involves an interaction term. I believe I know how to correctly specify the matrix that glht() takes in order to provide estimates for the contrasts. However, i'm unsure how to specify interaction planned contrasts in the matrix. Here is a reproducible example:

library(multcomp)

# Create minimal dataset

set.seed(100)
df <- data.frame(response = rnorm(2000), 
                 age = sample(15:60, 2000, replace = TRUE),
                 sex = rep(c("Male", "Female"), each = 1000), 
                 ses = rep(c("L", "LM", "MH", "H"), 500))

# Basic linear model
m1 <- lm(response ~ age + sex * ses, 
         data = df)

summary(m1)

Call:
lm(formula = response ~ age + sex * ses, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3423 -0.6697  0.0044  0.6716  3.2542 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)
(Intercept)   -0.0514746  0.0889766  -0.579    0.563
age           -0.0003435  0.0016934  -0.203    0.839
sexMale        0.0819362  0.0900868   0.910    0.363
sesL           0.0769168  0.0901245   0.853    0.394
sesLM          0.1289027  0.0901339   1.430    0.153
sesMH          0.0681291  0.0900821   0.756    0.450
sexMale:sesL  -0.0661439  0.1274402  -0.519    0.604
sexMale:sesLM -0.0925924  0.1275157  -0.726    0.468
sexMale:sesMH -0.1188700  0.1273978  -0.933    0.351

Residual standard error: 1.007 on 1991 degrees of freedom
Multiple R-squared:  0.001591,  Adjusted R-squared:  -0.002421 
F-statistic: 0.3966 on 8 and 1991 DF,  p-value: 0.9229

# Create a matrix of linear combinations
# Here 'H' and 'Female' are the reference categories, 

lc <- rbind("Female:H" = c(1, 0, 0, 0, 0, 0, 0, 0, 0),
            "Female:L" = c(1, 0, 0, 1, 0, 0, 0, 0, 0),
            "Female:LM" = c(1, 0, 0, 0, 1, 0, 0, 0, 0),
            "Female:MH" = c(1, 0, 0, 0, 0, 1, 0, 0, 0),
            "Male vs Female difference in L vs H" = c(1, 0, 0, 0, 0, 0, 1, 0, 0),
            "Male vs Female difference in LM vs H" = c(1, 0, 0, 0, 0, 0, 0, 1, 0),
            "Male vs Female difference in MH vs H" = c(1, 0, 0, 0, 0, 0, 0, 0, 1))

# Create matrix for the linfct argument in glht()
# One contrast compares the LM to the L category in females
# Another contrast compares the pooled LM and L categories to the H category in females

k <- rbind("Females: LM vs. L" = 
             lc["Female:LM", ] - lc["Female:L", ],
           "Females: Pooled L and LM vs H" =  
             ((lc["Female:LM", ] + lc["Female:L", ]) / 2) - lc["Female:H", ])

c1 <- glht(m1, linfct = k)
s1 <- summary(c1, adjusted(type = "none"))

confint(c1, adjusted(type = "none"))

I would like to add to k, contrasts:

Edit

  1. The interaction term that compares the additional difference between L and LM in Men vs. Women
  2. The interaction term that compares the additional difference of pooled L and LM categories versus H in Men vs. Women

For example, in the summary output above you would interpret the sexMale:sesL term as: The additional difference between L and H, for males vs. females.

This post in CrossValidated https://stats.stackexchange.com/questions/112944/design-of-matrix-of-contrasts-in-r was instrumental in getting me this far. However, i'm not sure what linear combinations from lc above I should be subtracting in k to get the coefficients I want.

Help is greatly appreciated.

Upvotes: 3

Views: 1174

Answers (1)

cuttlefish44
cuttlefish44

Reputation: 6796

Unfortunately, I don't think your lc and k are correct. The matrix's col-number corresponds to names(m1$coef)'s elments number. [1] Intercept, [2] age, [3] sexMale, [4] sesL, [5] sesLM, [6] sesMH, [7] sexMale:sesL, [8] sexMale:sesLM, [9] sexMale:sesMH.
So, "Female:L" = c(1, 0, 1, 0, 0, 0, 0, 0, 0) means "Male:H", and "Female:LM" = c(1, 0, 0, 1, 0, 0, 0, 0, 0) means Female:L. You used these difference as k[,1]. Attention to s1's Estimate, please.

s1
# Linear Hypotheses:
#                                     Estimate Std. Error t value Pr(>|t|)
# Females: LM vs. L == 0             -0.005019   0.090091  -0.056    0.956
# Females: Pooled L and LM vs H == 0  0.079427   0.078038   1.018    0.309

# In fact, you did "sexMale:H vs Female:L".
m1$coef["sesL"] - m1$coef["sexMale"]  # -0.005019421
You can get correct matrix by the method the post you linked showed.
group <- paste0(df$sex, df$ses)
group <- aggregate(model.matrix(m1) ~ group, FUN=mean)
rownames(group) <- group$group
group <- group[,-1]
group$age <- 0
lc2 <- as.matrix(group)
rownames(lc2)  <- c("Female:H", "Female:L", "Female:LM", "Female:MH", 
                    "Male:H", "Male:L", "Male:LM", "Male:MH")

lc2
          (Intercept) age sexMale sesL sesLM sesMH sexMale:sesL sexMale:sesLM sexMale:sesMH
Female:H            1   0       0    0     0     0            0             0             0
Female:L            1   0       0    1     0     0            0             0             0
Female:LM           1   0       0    0     1     0            0             0             0
Female:MH           1   0       0    0     0     1            0             0             0
Male:H              1   0       1    0     0     0            0             0             0
Male:L              1   0       1    1     0     0            1             0             0
Male:LM             1   0       1    0     1     0            0             1             0
Male:MH             1   0       1    0     0     1            0             0             1
k2 <- rbind("Females: LM vs. L" = 
              lc2["Female:LM", ] - lc2["Female:L", ],
            "Females: Pooled L and LM vs H" =  
              ((lc2["Female:LM", ] + lc2["Female:L", ]) / 2) - lc2["Female:H", ])

c1_2 <- glht(m1, linfct = k2)
s1_2 <- summary(c1_2, adjusted(type = "none"))

s1_2
# Linear Hypotheses:
#                                    Estimate Std. Error t value Pr(>|t|)
# Females: LM vs. L == 0              0.05199    0.09008   0.577    0.564
# Females: Pooled L and LM vs H == 0  0.10291    0.07807   1.318    0.188

m1$coef["sesLM"] - m1$coef["sesL"]       # 0.05198589
(m1$coef["sesL"] + m1$coef["sesLM"])/2   # 0.1029097 

[Edited]

The identity of sexMale is the interaction term Male : H just to be sure.

m2 <- lm(response ~ age + ses + sex:ses , data = df)
summary(m2) # excerpt
# sesH:sexMale   0.0819362  0.0900868   0.910    0.363

summary(m1) # excerpt
# sexMale        0.0819362  0.0900868   0.910    0.363

(Note: difference doesn't have an intercept because it is a common term)

k3 <- rbind(
  "Male vs Female difference in L vs H"  = 
    (lc2["Male:L",] - lc2["Male:H",]) - (lc2["Female:L",] - lc2["Female:H",]),
  "Male vs Female difference in LM vs H" = 
    (lc2["Male:LM",] - lc2["Male:H",]) - (lc2["Female:LM",] - lc2["Female:H",]),
  "Male vs Female difference in MH vs H" = 
    (lc2["Male:MH",] - lc2["Male:H",]) - (lc2["Female:MH",] - lc2["Female:H",]),
  # above is modified lc[5:7, ], below is what you want
  "Male vs Female difference in LM vs L" = 
    (lc2["Male:LM",] - lc2["Male:L",]) - (lc2["Female:LM",] - lc2["Female:L",]),
  "Male vs Female difference in Pooled L and LM vs H" = 
    ((lc2["Male:L", ] + lc2["Male:LM", ]) / 2 - lc2["Male:H", ])
      - ((lc2["Female:L", ] + lc2["Female:LM", ]) / 2 - lc2["Female:H", ])
)

k3   # I deleted some character, space, digit because of the space and viewability
                 (Intercept) age sexM  sesL sesLM  sesMH sexM:sesL sexM:sesLM sexM:sesMH
M vs F diff in L vs H      0   0    0     0     0      0         1          0          0
M vs F diff in LM vs H     0   0    0     0     0      0         0          1          0
M vs F diff in MH vs H     0   0    0     0     0      0         0          0          1
M vs F diff in LM vs L     0   0    0     0     0      0        -1          1          0
MvsF diff in L&LM vs H     0   0    0     0     0      0       0.5        0.5          0

Upvotes: 2

Related Questions