Juan Pedro Ross
Juan Pedro Ross

Reputation: 1

Coefficients in each step of stepwise regression

I am using a stepwise regression in R, with this code:

model_scicareer_all <- glm(scicareer ~ .,
                           family = binomial(link = "logit"), data = clean_data)
summary(model_scicareer_all)

intercept_only <- glm(scicareer ~ -1,
                      family = binomial(link = "logit"), data = clean_data)

forward <- step(intercept_only, direction='forward', scope=formula(model_scicareer_all), trace=1000)

The code works without a problem. However, I would like to get the coefficients in each step. Is that possible? (I know that SPSS does this.)

I tried with different inputs, but I could not print the coefficients of each step

Upvotes: 0

Views: 95

Answers (3)

Onyambu
Onyambu

Reputation: 79328

step has the keep argument. Use this to obtain the coefficients of each model run:

example:

final <- step(lm(mpg~1,mtcars), DF2formula(mtcars),keep = \(x,y)list(coef(x)), trace = FALSE)
c(final$keep)
[[1]]
(Intercept) 
   20.09062 

[[2]]
(Intercept)          wt 
  37.285126   -5.344472 

[[3]]
(Intercept)          wt         cyl 
  39.686261   -3.190972   -1.507795 

[[4]]
(Intercept)          wt         cyl          hp 
 38.7517874  -3.1669731  -0.9416168  -0.0180381 

Note that you can keep anything. including the models themselves:

final <- step(lm(mpg~1,mtcars), DF2formula(mtcars),keep = \(x, y)list(x), trace = FALSE)

[[1]]

Call:
lm(formula = mpg ~ 1, data = mtcars)

Coefficients:
(Intercept)  
      20.09  


[[2]]

Call:
lm(formula = mpg ~ wt, data = mtcars)

Coefficients:
(Intercept)           wt  
     37.285       -5.344  


[[3]]

Call:
lm(formula = mpg ~ wt + cyl, data = mtcars)

Coefficients:
(Intercept)           wt          cyl  
     39.686       -3.191       -1.508  


[[4]]

Call:
lm(formula = mpg ~ wt + cyl + hp, data = mtcars)

Coefficients:
(Intercept)           wt          cyl           hp  
   38.75179     -3.16697     -0.94162     -0.01804  

Upvotes: 3

Ben Bolker
Ben Bolker

Reputation: 226741

You can get the coefficients in a hacky way by capturing the output, extracting the formulas, and re-running the model for each formula. The last step below (dplyr::bind_rows()) is only necessary if you want to combine the coefficients in a table with NA values for coefficients not included in a particular model (this could certainly be done in base R as well, but not as compactly).

fm <- lm(mpg ~ ., data = mtcars)
cc <- capture.output(step(fm))
## get all lines with a tilde but *without* a comma (the finally selected
##  model includes the formula, we don't want that line)
forms <- grep("~", cc, value = TRUE) |> 
         grep(pattern = ",", invert = TRUE, value = TRUE)
## refit the model, get coefficients
coefs <- lapply(forms, \(f) coef(update(fm, formula = f)))
dplyr::bind_rows(coefs)
# A tibble: 8 × 11
  `(Intercept)`    cyl    disp      hp   drat    wt  qsec     vs    am   gear
          <dbl>  <dbl>   <dbl>   <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl>
1         12.3  -0.111  0.0133 -0.0215  0.787 -3.72 0.821  0.318  2.52  0.655
2         11.0  NA      0.0128 -0.0219  0.835 -3.69 0.842  0.390  2.58  0.712
3          9.77 NA      0.0121 -0.0210  0.875 -3.71 0.911 NA      2.52  0.760
4          9.20 NA      0.0155 -0.0247  0.810 -4.13 1.01  NA      2.59  0.606
5         10.7  NA      0.0131 -0.0218  1.02  -4.04 0.991 NA      2.98 NA    
6         14.4  NA      0.0112 -0.0212 NA     -4.08 1.01  NA      3.47 NA    
7         17.4  NA     NA      -0.0176 NA     -3.24 0.811 NA      2.93 NA    
8          9.62 NA     NA      NA      NA     -3.92 1.23  NA      2.94 NA    
[...]

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 270055

step has a trace argument but it seems it does not return coefficients. Instead use the trace function as shown.

trace(glm, exit = quote(print(coef(returnValue()))))
fm <- glm(mpg ~., data = mtcars)
step(fm)
untrace(flm)

Upvotes: 0

Related Questions