Reputation: 8044
When one performs a logit regression in R, it is possible to obtain coefficients after the optimization algorithm has converged (or not) with coefficients()
function:
library(MASS)
data(menarche)
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
family=binomial(logit), data=menarche)
coefficients(glm.out)
## (Intercept) Age
## -21.226395 1.631968
Is there a way to obtain coefficients for each step of the optimization algorithm to trace its steps?
Upvotes: 12
Views: 827
Reputation: 4910
I recommend performing a by-hand approach. Modify argument to glm.control
and supply the fitted values within a while
loop. See as follows:
library(MASS)
data(menarche)
converged <- F
coeftrace <- matrix(0, 25, 2)
i <- 1
mu <- NULL
while(!converged & i <= 25) {
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
mustart = mu,
family=binomial(logit), data=menarche,
control=glm.control(maxit=1))
mu <- glm.out$fitted.values
coeftrace[i,] <- coef(glm.out)
i <- i+1
converged<- glm.out$converged
}
coefficients(glm.out)
Gives:
> coeftrace
[,1] [,2]
[1,] -20.67365 1.589536
[2,] -21.20685 1.630468
[3,] -21.22637 1.631966
[4,] -21.22639 1.631968
[5,] 0.00000 0.000000
[6,] 0.00000 0.000000
[7,] 0.00000 0.000000
[8,] 0.00000 0.000000
The benefit is these values can be used for analysis, such as plotting. Note, the initial conditions supplied by glm
is not captured in this answer or in @GGrothendieck's answer either. That is, the glm.fit
behavior is to use binomial()$initialize
within the glm.fit
environment to set initial conditions, and would correspond to coefficients. This approach is more efficient than supplying wild starting parameters. If I change mu
above, I may add 4 additional iterations.
library(MASS)
data(menarche)
converged <- F
coeftrace <- matrix(0, 25, 2)
i <- 1
mu <- rep(0.1, 25)
while(!converged & i <=25) {
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
mustart = mu,
family=binomial(logit), data=menarche,
control=glm.control(maxit=1))
mu <- glm.out$fitted.values
coeftrace[i,] <- coef(glm.out)
i <- i+1
converged<- glm.out$converged
}
[,1] [,2]
[1,] -18.18140 1.5441246
[2,] -10.04138 0.6936207
[3,] -11.95066 0.9018941
[4,] -16.27171 1.2450011
[5,] -19.71743 1.5147288
[6,] -21.07972 1.6206137
[7,] -21.22499 1.6318598
[8,] -21.22639 1.6319683
[9,] -21.22639 1.6319683
[10,] 0.00000 0.0000000
[11,] 0.00000 0.0000000
Upvotes: 1
Reputation: 269461
The internals of glm.fit have changed (see comment from @John) so use this instead. It does not rely on line positions of the internals but rather intercepts each instance of cat in glm.fit and adds a message to iteration message so although it still depends on the internals it should be a bit less fragile. This worked for me in R 4.1 and 4.2.
library(MASS)
data(menarche)
trace(glm.fit, quote(cat <- function(...) {
base::cat(...)
if (...length() >= 3 && identical(..3, " Iterations - ")) print(coefold)
}))
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
family=binomial(logit), data=menarche,
control = glm.control(trace = TRUE))
untrace(glm.fit)
Previous solution
The control=
argument with the value shown causes the deviance to print and the trace
statement will cause the coefficient values to print:
trace(glm.fit, quote(print(coefold)), at = list(c(22, 4, 8, 4, 19, 3)))
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
family=binomial(logit), data=menarche,
control = glm.control(trace = TRUE))
The output will look like this:
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
NULL
Deviance = 27.23412 Iterations - 1
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] -20.673652 1.589536
Deviance = 26.7041 Iterations - 2
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] -21.206854 1.630468
Deviance = 26.70345 Iterations - 3
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] -21.226370 1.631966
Deviance = 26.70345 Iterations - 4
To remove the trace use:
untrace(glm.fit)
Note that in the trace
call, coefold
is the name of a variable used internally in glm.fit
source code and the numbers used refer to statement numbers in the source code and so either could need to be changed if glm.fit
source changes. I am using "R version 3.2.2 Patched (2015-10-19 r69550)".
Upvotes: 11