Marco Mello
Marco Mello

Reputation: 175

bootstrap standard errors of a linear regression in R

I have a lm object and I would like to bootstrap only its standard errors. In practice I want to use only part of the sample (with replacement) at each replication and get a distribution of standard erros. Then, if possible, I would like to display the summary of the original linear regression but with the bootstrapped standard errors and the corresponding p-values (in other words same beta coefficients but different standard errors).

Edited: In summary I want to "modify" my lm object by having the same beta coefficients of the original lm object that I ran on the original data, but having the bootstrapped standard errors (and associated t-stats and p-values) obtained by computing this lm regression several times on different subsamples (with replacement).

So my lm object looks like

    Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      3.812793   0.095282  40.016  < 2e-16 ***
x                                -0.904729   0.284243  -3.183  0.00147 ** 
z                                0.599258   0.009593  62.466  < 2e-16 ***
x*z                              0.091511   0.029704   3.081  0.00208 ** 

but the associated standard errors are wrong, and I would like to estimate them by replicating this linear regression 1000 times (replications) on different subsample (with replacement).

Is there a way to do this? can anyone help me?

Thank you for your time. Marco

Upvotes: 2

Views: 3564

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76402

What you ask can be done following the line of the code below.
Since you have not posted an example dataset nor the model to fit, I will use the built in dataset mtcars an a simple formula with two continuous predictors.

library(boot)

boot_function <- function(data, indices, formula){
  d <- data[indices, ]
  obj <- lm(formula, d)
  coefs <- summary(obj)$coefficients
  coefs[, "Std. Error"]
}

set.seed(8527)

fmla <- as.formula("mpg ~ hp * cyl")
seboot <- boot(mtcars, boot_function, R = 1000, formula = fmla)

colMeans(seboot$t)
##[1] 6.511530646 0.068694001 1.000101450 0.008804784

I believe that it is possible to use the code above for most needs with numeric response and predictors.

Upvotes: 3

Related Questions