Marcin
Marcin

Reputation: 8044

Is there a way to obtain coefficients for each step of the optimization algorithm in glm function?

When one performs a logit regression in R, it is possible to obtain coefficients after the optimization algorithm has converged (or not) with coefficients() function:

library(MASS)
data(menarche)
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
               family=binomial(logit), data=menarche)
coefficients(glm.out)
## (Intercept)         Age 
## -21.226395    1.631968

Is there a way to obtain coefficients for each step of the optimization algorithm to trace its steps?

Upvotes: 12

Views: 827

Answers (2)

AdamO
AdamO

Reputation: 4910

I recommend performing a by-hand approach. Modify argument to glm.control and supply the fitted values within a while loop. See as follows:

library(MASS)
data(menarche)
converged <- F
coeftrace <- matrix(0, 25, 2)
i <- 1
mu <- NULL

while(!converged & i <= 25) {
  glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
                mustart = mu,
                family=binomial(logit), data=menarche, 
                control=glm.control(maxit=1))
  mu <- glm.out$fitted.values
  coeftrace[i,] <- coef(glm.out)
  i <- i+1
  converged<- glm.out$converged
}
coefficients(glm.out)

Gives:

> coeftrace
           [,1]     [,2]
 [1,] -20.67365 1.589536
 [2,] -21.20685 1.630468
 [3,] -21.22637 1.631966
 [4,] -21.22639 1.631968
 [5,]   0.00000 0.000000
 [6,]   0.00000 0.000000
 [7,]   0.00000 0.000000
 [8,]   0.00000 0.000000

The benefit is these values can be used for analysis, such as plotting. Note, the initial conditions supplied by glm is not captured in this answer or in @GGrothendieck's answer either. That is, the glm.fit behavior is to use binomial()$initialize within the glm.fit environment to set initial conditions, and would correspond to coefficients. This approach is more efficient than supplying wild starting parameters. If I change mu above, I may add 4 additional iterations.

library(MASS)
data(menarche)
converged <- F
coeftrace <- matrix(0, 25, 2)
i <- 1
mu <- rep(0.1, 25)

while(!converged & i <=25) {
  glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
                mustart = mu,
                family=binomial(logit), data=menarche, 
                control=glm.control(maxit=1))
  mu <- glm.out$fitted.values
  coeftrace[i,] <- coef(glm.out)
  i <- i+1
  converged<- glm.out$converged
}

           [,1]      [,2]
 [1,] -18.18140 1.5441246
 [2,] -10.04138 0.6936207
 [3,] -11.95066 0.9018941
 [4,] -16.27171 1.2450011
 [5,] -19.71743 1.5147288
 [6,] -21.07972 1.6206137
 [7,] -21.22499 1.6318598
 [8,] -21.22639 1.6319683
 [9,] -21.22639 1.6319683
[10,]   0.00000 0.0000000
[11,]   0.00000 0.0000000

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269461

The internals of glm.fit have changed (see comment from @John) so use this instead. It does not rely on line positions of the internals but rather intercepts each instance of cat in glm.fit and adds a message to iteration message so although it still depends on the internals it should be a bit less fragile. This worked for me in R 4.1 and 4.2.

library(MASS)
data(menarche)

trace(glm.fit, quote(cat <- function(...) {
  base::cat(...)
  if (...length() >= 3 && identical(..3, " Iterations - ")) print(coefold)
}))
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
                     family=binomial(logit), data=menarche,
                     control = glm.control(trace = TRUE))
untrace(glm.fit)

Previous solution

The control= argument with the value shown causes the deviance to print and the trace statement will cause the coefficient values to print:

trace(glm.fit, quote(print(coefold)), at = list(c(22, 4, 8, 4, 19, 3)))
glm.out = glm(cbind(Menarche, Total-Menarche) ~ Age,
                     family=binomial(logit), data=menarche,
                     control = glm.control(trace = TRUE))

The output will look like this:

Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  .... step 22,4,8,4,19,3 
NULL
Deviance = 27.23412 Iterations - 1
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  .... step 22,4,8,4,19,3 
[1] -20.673652   1.589536
Deviance = 26.7041 Iterations - 2
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  .... step 22,4,8,4,19,3 
[1] -21.206854   1.630468
Deviance = 26.70345 Iterations - 3
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  .... step 22,4,8,4,19,3 
[1] -21.226370   1.631966
Deviance = 26.70345 Iterations - 4

To remove the trace use:

untrace(glm.fit)

Note that in the trace call, coefold is the name of a variable used internally in glm.fit source code and the numbers used refer to statement numbers in the source code and so either could need to be changed if glm.fit source changes. I am using "R version 3.2.2 Patched (2015-10-19 r69550)".

Upvotes: 11

Related Questions