Correcting confidence intervals for multiple comparisons using the multcomp package

Question

In this excellent post on cross validated an answer mentioned that it is relatively easy to correct confidence intervals for multiple comparisons.

I am wondering whether this is possible using the multcomp package in R

Here's an example model using the mtcars dataset from base R, a multivariate regression predicting miles per gallon (mpg) from a list of predictors.

set.seed(1234)
df <- mtcars

# which variables predict miles per gallon?
sumMod <- summary(mod <- lm(formula = mpg ~ cyl + disp + hp + drat + wt + qsec + vs + gear,
                            data = df))

Now run the model through the multcomp package using the glht function.

# adjust using multcomp
library(multcomp)
K <- length(coefficients(mod))
modGLHT <- glht(model = mod,
                linfct = diag(K))

In multcomp you choose the method of adjustment using the test = adjusted(type = 'foo') argument within the package's summary() function

sumNone <- summary(modGLHT, test = adjusted(type = "none")) # no adjustment
sumSingleStep <- summary(modGLHT, test = adjusted(type = "single-step")) # single-step method

Now if we compare the p-value for the wt coefficient under no correction and with single-step correction...

data.frame(correction = c("none", "single-step"),
           coefs = c(round(sumNone$test$coefficients[6],2),
                     round(sumSingleStep$test$coefficients[6],2)),
           p = c(round(sumNone$test$pvalues[6],3),
                 round(sumSingleStep$test$pvalues[6],3)))

 # output
 #   correction coefs     p
 #         none -4.36 0.003
 #  single-step -4.36 0.024

We can see that the p-value has increased after applying single-step correction. So far so good.

My issue is how to get corrected confidence intervals?

I tried applying adjusted(type = 'foo') within the confint() function (once again for illustration I have focused on the wt predictor) like so...

# get CIs using four different methods
ciNone <- round(as.data.frame(confint(modGLHT, adjusted(type = "none"))$confint)[6,],3)
ciSS <- round(as.data.frame(confint(modGLHT, adjusted(type = "single-step"))$confint)[6,],3)
ciShaffer <- round(as.data.frame(confint(modGLHT, adjusted(type = "Shaffer"))$confint)[6,],3)
ciWestfall <- round(as.data.frame(confint(modGLHT, adjusted(type = "Westfall"))$confint)[6,],3)

# put them all together for comparison
cbind(data.frame(correction = c("None", "Single_Step", "Shaffer", "Westfall")),
      rbind(ciNone, ciSS, ciShaffer, ciWestfall))

# output
#     correction Estimate    lwr    upr
# 6         None   -4.356 -8.274 -0.439
# 61 Single_Step   -4.356 -8.269 -0.444
# 62     Shaffer   -4.356 -8.276 -0.437
# 63    Westfall   -4.356 -8.279 -0.434

No it looks to me as if something is going on, there are some very small differences. But when I run the confint() function on its own and ...

confint(modGLHT, adjusted(type = "Shaffer"))

the output makes no mention anywhere of type of error correction. I also get no error message. So I can't tell if the adjusted(type = 'foo') argument is doing anything nor whether the above differences in the CIs using the different methods are actual differences due to the method itself or simply an artefact of some randomness in a Markov-type procedure used to generate the intervals.

So is it legitimate to add the adjusted(type = 'foo') argument to the confint() function in multcomp?. The help documentation for the function does not say so and although I got no error message when I did it I cannot tell if it worked.

Any help much appreciated.

dipetkov · Accepted Answer

No, you can't specify type of adjustment when computing confidence intervals with multcomp. As discussed in the comments, you can choose between individual confidence intervals and simultaneous confidence intervals.

Individual confidence intervals: Control the coverage probability of each confidence interval separately, ie., each confidence interval will contain the true parameter 100(1-α)% times if the experiment is repeated.
Simultaneous confidence intervals: Control the overall coverage probability, ie, all confidence intervals will contain their true parameter 100(1-α)% times if the experiment is repeated. This corresponds to family-wise error rate FWER = α because the probability of one or more false discoveries (type I errors) is α. [However, simultaneous CIs are not equivalent to Bonferroni correction; see the update below.]

multcomp takes into account the correlations between hypotheses to get the coverage of simultaneous confidence intervals right and that's all the adjustment needed. The documentation is somewhat clear about all this.

To compute p-values, use the summary function on a glht object and specify how to adjust the p-values with the test = adjusted(type = "method") argument. The available methods include "single-step" (the default) and "bonferroni".

To compute confidence intervals, use the confint function of a glht object and specify how to compute the critical value with the calpha argument. There are two options: univariate_calpha() to compute univariate confidence intervals and adjusted_calpha() to compute simultaneous confidence intervals. Basically, confint ignores the type = "method" argument, if provided.

If multcomp doesn't adjust simultaneous confidence intervals, why does it seem that something is going on? It's because there is randomness in the computation of the critical value, calpha.

multcomp uses mvtnorm::qmvt to compute calpha, which in turn uses as a stochastic algorithm to find the quantiles a multivariate normal distribution.

In effect this is what you did: You fix the random seed once and call a stochastic function multiple times. You get slightly different critical values.

set.seed(1234)
calpha <- qmvt(level, ...)
calpha <- qmvt(level, ...)
calpha <- qmvt(level, ...)

Instead fix the seed before calling the stochastic function. You get exactly the same critical value each time and the simultaneous confidence intervals won't change.

set.seed(1234)
calpha <- qmvt(level, ...)
set.seed(1234)
calpha <- qmvt(level, ...)
set.seed(1234)
calpha <- qmvt(level, ...)

Let's demonstrate this on your example.

library("broom")
library("multcomp")
library("tidyverse")

options(
  pillar.sigfig = 5
)

mod <- lm(
  formula = mpg ~ cyl + disp + hp + drat + wt + qsec + vs + gear,
  data = mtcars
)

# Usually we aren't interested in testing that the intercept is 0.
# So let's skip it.
K <- diag(length(coef(mod)))[-1, ]
rownames(K) <- names(coef(mod))[-1]
K
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#> cyl     0    1    0    0    0    0    0    0    0
#> disp    0    0    1    0    0    0    0    0    0
#> hp      0    0    0    1    0    0    0    0    0
#> drat    0    0    0    0    1    0    0    0    0
#> wt      0    0    0    0    0    1    0    0    0
#> qsec    0    0    0    0    0    0    1    0    0
#> vs      0    0    0    0    0    0    0    1    0
#> gear    0    0    0    0    0    0    0    0    1

modGLHT <- glht(
  model = mod,
  linfct = K
)

methods <- c("none", "single-step", "Shaffer", "Westfall")
names(methods) <- methods

There is randomness in computing the critical value, calpha. So we get slightly different simultaneous confidence intervals for the same set of hypotheses.

set.seed(1234)

methods %>%
  map_dfr(

    # `confint` silently ignores `adjusted`
    ~ tidy(confint(modGLHT, adjusted(type = .))),
    .id = "method"
  ) %>%
  filter(
    contrast == "cyl"
  )
#> # A tibble: 4 × 5
#>   method      contrast estimate conf.low conf.high
#>                          
#> 1 none        cyl      -0.52659  -3.3654    2.3122
#> 2 single-step cyl      -0.52659  -3.3665    2.3133
#> 3 Shaffer     cyl      -0.52659  -3.3670    2.3138
#> 4 Westfall    cyl      -0.52659  -3.3624    2.3093

We fix the random seed before computing the critical value calpha, so the simultaneous confidence intervals are computed in a reproducible way.

methods %>%
  map_dfr(

    # `confint` silently ignores `adjusted`
    ~ {
      set.seed(1234)
      tidy(confint(modGLHT, adjusted(type = .)))
    },
    .id = "method"
  ) %>%
  filter(
    contrast == "cyl"
  )
#> # A tibble: 4 × 5
#>   method      contrast estimate conf.low conf.high
#>                          
#> 1 none        cyl      -0.52659  -3.3648    2.3116
#> 2 single-step cyl      -0.52659  -3.3648    2.3116
#> 3 Shaffer     cyl      -0.52659  -3.3648    2.3116
#> 4 Westfall    cyl      -0.52659  -3.3648    2.3116

Updates

Bonferroni controls the familiy-wise error rate (FWER) at the α level and simultaneous confidence intervals are 100(1-α)% family-wise confidence intervals. So Bonferroni and simultaneous CIs have the FWER concept in common. But they are not equivalent. Why?

The Bonferroni method divides the p-values of m simultaneous hypotheses by m. The procedure is conservative: it guarantees that the FWER is ≤ α rather than = α and it tends to over-correct, a lot.

In general, we don't know the dependencies between hypotheses. However, we do with regression because we have an estimate of the covariance matrix Var{(β₁, β₂, ..., β_p)}. So we can estimate correlations between hypotheses which are functions of the model coefficients (such as β₁ = 0, β₂ = 0, ..., β_p = 0).

And this is how multcomp computes simultaneous confidence intervals: it takes into account the correlations between hypotheses to correct the critical value calpha "just the right amount". While Bonferroni ignores this information to apply the most stringent correction.

The simultaneous confidence intervals are determined by the set of hypotheses being tested. Changing the other hypotheses can lead to a different confidence interval for the same individual hypothesis because the overall coverage depends in a complex way on the correlations between all hypotheses.

set.seed(1234)

modGLHT <- glht(
  model = mod,
  linfct = c("cyl = 0", "disp = 0")
)

tidy(confint(modGLHT, adjusted(type = .)))
#> # A tibble: 2 × 4
#>   contrast  estimate  conf.low conf.high
#>                     
#> 1 cyl      -0.52659  -2.8330    1.7798  
#> 2 disp      0.016565 -0.014587  0.047716

Correcting confidence intervals for multiple comparisons using the multcomp package

Answers (1)

Related Questions