Michael Ohlrogge
Michael Ohlrogge

Reputation: 10980

Get Regression Coefficient Names with R Bootstrap

I'm using the boot package in R to calculate bootstrapped SEs and confidence intervals. I'm trying to find an elegant and efficient way of getting the names of my parameters along with the bootstrap distribution of their estimates. For instance, consider the simple example given here:

# Bootstrap 95% CI for regression coefficients 
library(boot)
# function to obtain regression weights 
bs = function(data, indices, formula) {
    d = data[indices,] # allows boot to select sample 
    fit = lm(formula, data=d)
    return(coef(fit))
}
# bootstrapping with 1000 replications 
results = boot(
    data=mtcars, 
    statistic=bs, 
    R=1000, 
    formula=mpg~wt+disp)

This works fine, except that the results just appear as numerical indices:

# view results
results
Bootstrap Statistics :
       original        bias    std. error
t1* 34.96055404  0.1559289371 2.487617954
t2* -3.35082533 -0.0948558121 1.152123237
t3* -0.01772474  0.0002927116 0.008353625

Particularly when getting into long, complicated regression formulas, involving a variety of factor variables, it can take some work to keep track of precisely which indices go with which coefficient estimates.

I could of course just re-fit my model again outside of the bootstrap function, and extract the names with names(coef(fit)) or something, or likely use something else such as a call to model.matrix(). These seem cumbersome, both in terms of extra coding but also in terms of extra CPU and ram resources.

How can I more easily get a nice vector of the coefficient names to pair a vector of coefficient standard errors in situations like this?

UPDATE

Based on the great answer from lmo, here is my basic code to get a basic regression table:

Names = names(results$t0)
SEs = sapply(data.frame(results$t), sd)
Coefs = as.numeric(results$t0)
zVals = Coefs / SEs
Pvals = 2*pnorm(-abs(zVals))

Formatted_Results = cbind(Names, Coefs, SEs, zVals, Pvals)

Upvotes: 4

Views: 1314

Answers (1)

lmo
lmo

Reputation: 38500

The estimates from calling the "boot strapped" function, here lm, on the original data, are stored in an element of the list called "t0".

results$t0
(Intercept)          wt        disp 
34.96055404 -3.35082533 -0.01772474

This object preserves the names of the estimates from original function call, which you can then access with names.

names(results$t0)
[1] "(Intercept)" "wt"          "disp"

Upvotes: 3

Related Questions