Reputation: 1
I am using a stepwise regression in R, with this code:
model_scicareer_all <- glm(scicareer ~ .,
family = binomial(link = "logit"), data = clean_data)
summary(model_scicareer_all)
intercept_only <- glm(scicareer ~ -1,
family = binomial(link = "logit"), data = clean_data)
forward <- step(intercept_only, direction='forward', scope=formula(model_scicareer_all), trace=1000)
The code works without a problem. However, I would like to get the coefficients in each step. Is that possible? (I know that SPSS does this.)
I tried with different inputs, but I could not print the coefficients of each step
Upvotes: 0
Views: 95
Reputation: 79328
step has the keep
argument. Use this to obtain the coefficients of each model run:
example:
final <- step(lm(mpg~1,mtcars), DF2formula(mtcars),keep = \(x,y)list(coef(x)), trace = FALSE)
c(final$keep)
[[1]]
(Intercept)
20.09062
[[2]]
(Intercept) wt
37.285126 -5.344472
[[3]]
(Intercept) wt cyl
39.686261 -3.190972 -1.507795
[[4]]
(Intercept) wt cyl hp
38.7517874 -3.1669731 -0.9416168 -0.0180381
Note that you can keep anything. including the models themselves:
final <- step(lm(mpg~1,mtcars), DF2formula(mtcars),keep = \(x, y)list(x), trace = FALSE)
[[1]]
Call:
lm(formula = mpg ~ 1, data = mtcars)
Coefficients:
(Intercept)
20.09
[[2]]
Call:
lm(formula = mpg ~ wt, data = mtcars)
Coefficients:
(Intercept) wt
37.285 -5.344
[[3]]
Call:
lm(formula = mpg ~ wt + cyl, data = mtcars)
Coefficients:
(Intercept) wt cyl
39.686 -3.191 -1.508
[[4]]
Call:
lm(formula = mpg ~ wt + cyl + hp, data = mtcars)
Coefficients:
(Intercept) wt cyl hp
38.75179 -3.16697 -0.94162 -0.01804
Upvotes: 3
Reputation: 226741
You can get the coefficients in a hacky way by capturing the output, extracting the formulas, and re-running the model for each formula. The last step below (dplyr::bind_rows()
) is only necessary if you want to combine the coefficients in a table with NA
values for coefficients not included in a particular model (this could certainly be done in base R as well, but not as compactly).
fm <- lm(mpg ~ ., data = mtcars)
cc <- capture.output(step(fm))
## get all lines with a tilde but *without* a comma (the finally selected
## model includes the formula, we don't want that line)
forms <- grep("~", cc, value = TRUE) |>
grep(pattern = ",", invert = TRUE, value = TRUE)
## refit the model, get coefficients
coefs <- lapply(forms, \(f) coef(update(fm, formula = f)))
dplyr::bind_rows(coefs)
# A tibble: 8 × 11
`(Intercept)` cyl disp hp drat wt qsec vs am gear
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12.3 -0.111 0.0133 -0.0215 0.787 -3.72 0.821 0.318 2.52 0.655
2 11.0 NA 0.0128 -0.0219 0.835 -3.69 0.842 0.390 2.58 0.712
3 9.77 NA 0.0121 -0.0210 0.875 -3.71 0.911 NA 2.52 0.760
4 9.20 NA 0.0155 -0.0247 0.810 -4.13 1.01 NA 2.59 0.606
5 10.7 NA 0.0131 -0.0218 1.02 -4.04 0.991 NA 2.98 NA
6 14.4 NA 0.0112 -0.0212 NA -4.08 1.01 NA 3.47 NA
7 17.4 NA NA -0.0176 NA -3.24 0.811 NA 2.93 NA
8 9.62 NA NA NA NA -3.92 1.23 NA 2.94 NA
[...]
Upvotes: 0
Reputation: 270055
step
has a trace
argument but it seems it does not return coefficients. Instead use the trace
function as shown.
trace(glm, exit = quote(print(coef(returnValue()))))
fm <- glm(mpg ~., data = mtcars)
step(fm)
untrace(flm)
Upvotes: 0