How to interpret coefficients of logistic regression

Question

I'm trying to figure out how the coefficients of logistic regression with a polynomial term relate to predictions. Specifically, I'm interested in the location on the x-axis where the prediction is highest. Example below:

set.seed(42)

# Setup some dummy data
x <- 1:200
y <- rep(0, length(x))
y[51:150] <- rbinom(100, 1, 0.5)

# Fit a model
family <- binomial()
model  <- glm(y ~ poly(x, 2), family = family)

# Illustrate model
plot(x, y)
lines(x, family$linkinv(predict(model)), col = 2)

The model above gives me these coefficients:

coef(model)
#> (Intercept) poly(x, 2)1 poly(x, 2)2 
#>   -1.990317   -3.867855  -33.299893

^{Created on 2021-08-03 by the reprex package (v1.0.0)}

The manual page for poly() states the following:

The orthogonal polynomial is summarized by the coefficients, which can be used to evaluate it via the three-term recursion given in Kennedy & Gentle (1980, pp. 343–4), and used in the predict part of the code.

However, I don't have access to the book, nor am I able to discern from the predict.glm S3 method how these coefficients are handled. Is there a way to reconstruct the location of the summit (around 100 in the example) from the coefficients alone (i.e. without using predict() to find the maximum)?

mastropi · Accepted Answer

Derivation of the location of the predicted maximum from the theoretical expressions of the orthogonal polynomials

I got a copy of the "Statistical Computing" book by Kennedy and Gentle (1982) referenced in the documentation of poly and now share my findings about the calculation of the orthogonal polynomials, and how we can use them to find the location x of the maximum predicted value.

The orthogonal polynomials presented in the book (pp. 343-4) are monic (i.e. the highest order coefficient is always 1) and are obtained by the following recurrence procedure:

where q is the number of orthogonal polynomials considered.

Note the following relationship of the above terminology with the documentation of poly:

The "three-term recursion" appearing in the excerpt included in your question is the RHS of the third expression which has precisely three terms.
The rho(j+1) coefficients in the third expression are called "centering constants".
The gamma(j) coefficients in the third expression do not have a name in the documentation but they are directly related to the "normalization constants", as seen below.

For reference, here I paste the relevant excerpt of the "Value" section of the poly documentation:

A matrix with rows corresponding to points in x and columns corresponding to the degree, with attributes "degree" specifying the degrees of the columns and (unless raw = TRUE) "coefs" which contains the centering and normalization constants used in constructing the orthogonal polynomials

Going back to the recurrence, we can derive the values of parameters rho(j+1) and gamma(j) from the third expression by imposing the orthogonality condition on p(j+1) w.r.t. p(j) and p(j-1).
(It's important to note that the orthogonality condition is not an integral, but a summation on the n observed x points, so the polynomial coefficients depend on the data! --which is not the case for instance for the Tchebyshev orthogonal polynomials).

The expressions for the parameters become:

For the polynomials of orders 1 and 2 used in your regression, we get the following expressions, already written in R code:

# First we define the number of observations in the data
n = length(x)

# For p1(x):
# p1(x) = (x - rho1) p0(x)      (since p_{-1}(x) = 0)
rho1 = mean(x)

# For p2(x)
# p2(x) = (x - rho2) p1(x) - gamma1
gamma1 = var(x) * (n-1)/n
rho2 = sum( x * (x - mean(x))^2 ) / (n*gamma1)

for which we get:

> c(rho1, rho2, gamma1)
[1]  100.50  100.50 3333.25

Note that coefs attribute of poly(x,2) is:

> attr(poly(x,2), "coefs")
$alpha
[1] 100.5 100.5

$norm2
[1]          1        200     666650 1777555560

where $alpha contains the centering constants, i.e. the rho values (which coincide with ours --incidentally all centering constants are equal to the average of x when the distribution of x is symmetric for any q! (observed and proved)), and $norm2 contains the normalization constants (in this case for p(-1,x), p(0,x), p(1,x), and p(2,x)), that is the constants c(j) that normalize the polynomials in the recurrence formula --by dividing them by sqrt(c(j))--, making the resulting polynomials r(j,x) satisfy sum_over_i{ r(j,x_i)^2 } = 1; note that r(j,x) are the polynomials stored in the object returned by poly().

From the expression already given above, we observe that gamma(j) is precisely the ratio of two consecutive normalization constants, namely: gamma(j) = c(j) / c(j-1).

We can check that our gamma1 value coincides with this ratio by computing:

gamma1 == attr(poly(x,2), "coefs")$norm2[3] / attr(poly(x,2), "coefs")$norm2[2]

which returns TRUE.

Going back to your problem of finding the maximum of the values predicted by your model, we can:

Express the predicted value as a function of r(1,x) and r(2,x) and the coefficients from the logistic regression, namely:

pred(x) = beta0 + beta1 * r(1,x) + beta2 * r(2,x)
Derive the expression w.r.t. x, set it to 0 and solve for x.

In R code:

# Get the normalization constants alpha(j) to obtain r(j,x) from p(j,x) as
# r(j,x) = p(j,x) / sqrt( norm(j) ) = p(j,x) / alpha(j)
alpha1 = sqrt( attr(poly(x,2), "coefs")$norm2[3] )
alpha2 = sqrt( attr(poly(x,2), "coefs")$norm2[4] )

# Get the logistic regression coefficients (beta1 and beta2)
coef1 = as.numeric( model$coeff["poly(x, 2)1"] )
coef2 = as.numeric( model$coeff["poly(x, 2)2"] )

# Compute the x at which the maximum occurs from the expression that is obtained
# by deriving the predicted expression pred(x) = beta0 + beta1*r(1,x) + beta2*r(2,x)
# w.r.t. x and setting the derivative to 0.
xmax = ( alpha2^-1 * coef2 * (rho1 + rho2) - alpha1^-1 * coef1 ) / (2 * alpha2^-1 * coef2)

which gives:

> xmax
[1] 97.501114

i.e. the same value obtained with the other "empirical" method described in my previous answer.

The full code to obtain the location x of the maximum of the predicted values, starting off from the code you provided, is:

# First we define the number of observations in the data
n = length(x)

# Parameters for p1(x):
# p1(x) = (x - rho1) p0(x)      (since p_{-1}(x) = 0)
rho1 = mean(x)

# Parameters for p2(x)
# p2(x) = (x - rho2) p1(x) - gamma1
gamma1 = var(x) * (n-1)/n
rho2 = mean( x * (x - mean(x))^2 ) / gamma1

# Get the normalization constants alpha(j) to obtain r(j,x) from p(j,x) as
# r(j,x) = p(j,x) / sqrt( norm(j) ) = p(j,x) / alpha(j)
alpha1 = sqrt( attr(poly(x,2), "coefs")$norm2[3] )
alpha2 = sqrt( attr(poly(x,2), "coefs")$norm2[4] )

# Get the logistic regression coefficients (beta1 and beta2)
coef1 = as.numeric( model$coeff["poly(x, 2)1"] )
coef2 = as.numeric( model$coeff["poly(x, 2)2"] )

# Compute the x at which the maximum occurs from the expression that is obtained
# by deriving the predicted expression pred(x) = beta0 + beta1*r(1,x) + beta2*r(2,x)
# w.r.t. x and setting the derivative to 0.
( xmax = ( alpha2^-1 * coef2 * (rho1 + rho2) - alpha1^-1 * coef1 ) / (2 * alpha2^-1 * coef2) )

How to interpret coefficients of logistic regression

Answers (2)

Derivation of the location of the predicted maximum from the theoretical expressions of the orthogonal polynomials

SUMMARY

1) Infer the polynomial coefficients

2) Derive the prediction expression w.r.t. x and set the derivative to 0

DETAILS

1) Polynomial coefficients

2) Derivative of the prediction to find the maximum

CHECK

Related Questions