Indrajeet Patil
Indrajeet Patil

Reputation: 4879

wrong confidence intervals from `stats::confint` for `mlm` objects?

I am not sure here if I am doing something wrong or this is a bug in confint function in R itself but I am getting confidence intervals for regression estimate which don't contain the estimate.

Here is reprex:

# model (converting all numeric columns in data to z-scores)
mod <-
  stats::lm(
    formula = cbind(mpg, disp) ~ wt,
    data = purrr::modify_if(.x = mtcars, .p = is.numeric, .f = scale)
  )

# tidy dataframe
broom::tidy(mod, conf.int = TRUE)
#> # A tibble: 4 x 8
#>   response term         estimate std.error statistic  p.value conf.low conf.high
#>   <chr>    <chr>           <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 mpg      (Intercept)  8.05e-17    0.0893  9.01e-16 1.00e+ 0   -0.182     0.182
#> 2 mpg      wt          -8.68e- 1    0.0908 -9.56e+ 0 1.29e-10   -1.05     -0.682
#> 3 disp     (Intercept) -1.70e-16    0.0826 -2.06e-15 1.00e+ 0   -0.182     0.182
#> 4 disp     wt           8.88e- 1    0.0840  1.06e+ 1 1.22e-11   -1.05     -0.682


confint(mod)
#>                   2.5 %     97.5 %
#> :(Intercept) -0.1824544  0.1824544
#> :wt          -1.0530332 -0.6822855
#> :(Intercept) -0.1824544  0.1824544
#> :wt          -1.0530332 -0.6822855

This becomes more conspicuous if you plot the estimates:

<sup>Created on 2020-02-22 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0.9001)</sup>

Am I doing something wrong here? Or this is expected behavior?

Upvotes: 2

Views: 314

Answers (1)

StupidWolf
StupidWolf

Reputation: 46898

confint() does work with mlm now:

confint(lm(mpg~wt,data=mtcars))
                2.5 %    97.5 %
(Intercept) 33.450500 41.119753
wt          -6.486308 -4.202635

confint(lm(disp~wt,data=mtcars))
                2.5 %   97.5 %
(Intercept) -204.0914 -58.2054
wt            90.7579 134.1984

confint(lm(cbind(mpg,disp)~wt,data=mtcars))
                       2.5 %     97.5 %
mpg:(Intercept)    33.450500  41.119753
mpg:wt             -6.486308  -4.202635
disp:(Intercept) -204.091436 -58.205395
disp:wt            90.757897 134.198380

The problem is with using scale inside dplyr or purrr, I used as.data.frame after scale because all your columns are numeric anyway:

confint(lm(cbind(mpg,disp)~wt,data=as.data.frame(scale(mtcars))))

                          2.5 %     97.5 %
    mpg:(Intercept)  -0.1824544  0.1824544
    mpg:wt           -1.0530332 -0.6822855
    disp:(Intercept) -0.1687740  0.1687740
    disp:wt           0.7165054  1.0594545

For dplyr and purrr, you can see that the variable names are messed up too:

library(dplyr)
library(purrr)

confint(lm(cbind(mpg,disp)~wt,data=modify_if(.x = mtcars, .p = is.numeric, .f = scale)))

                  2.5 %     97.5 %
:(Intercept) -0.1824544  0.1824544
:wt          -1.0530332 -0.6822855
:(Intercept) -0.1824544  0.1824544
:wt          -1.0530332 -0.6822855

confint(lm(cbind(mpg,disp)~wt,data=mutate_if(mtcars,is.numeric,scale))

                  2.5 %     97.5 %
:(Intercept) -0.1824544  0.1824544
:wt          -1.0530332 -0.6822855
:(Intercept) -0.1824544  0.1824544
:wt          -1.0530332 -0.6822855

I am guessing when you do scale, the attributes that are carried forward messes with something in confint (still looking at the code.) One way you can solve this is to force the output to a vector (using c or as.numeric):

confint(lm(cbind(mpg,disp)~wt,data=modify_if(mtcars,is.numeric,~c(scale(.x)))))

or

confint(lm(cbind(mpg,disp)~wt,data=mutate_if(mtcars,is.numeric,~c(scale(.x)))))

                      2.5 %     97.5 %
mpg:(Intercept)  -0.1824544  0.1824544
mpg:wt           -1.0530332 -0.6822855
disp:(Intercept) -0.1687740  0.1687740
disp:wt           0.7165054  1.0594545

Unfortunately, the tidy(..,conf.int=TRUE) does not quite work, so for plotting you need to use the confint from stats with some manipulation.

Upvotes: 3

Related Questions