teunbrand
teunbrand

Reputation: 38053

How to interpret coefficients of logistic regression

I'm trying to figure out how the coefficients of logistic regression with a polynomial term relate to predictions. Specifically, I'm interested in the location on the x-axis where the prediction is highest. Example below:

set.seed(42)

# Setup some dummy data
x <- 1:200
y <- rep(0, length(x))
y[51:150] <- rbinom(100, 1, 0.5)

# Fit a model
family <- binomial()
model  <- glm(y ~ poly(x, 2), family = family)

# Illustrate model
plot(x, y)
lines(x, family$linkinv(predict(model)), col = 2)

The model above gives me these coefficients:

coef(model)
#> (Intercept) poly(x, 2)1 poly(x, 2)2 
#>   -1.990317   -3.867855  -33.299893

Created on 2021-08-03 by the reprex package (v1.0.0)

The manual page for poly() states the following:

The orthogonal polynomial is summarized by the coefficients, which can be used to evaluate it via the three-term recursion given in Kennedy & Gentle (1980, pp. 343–4), and used in the predict part of the code.

However, I don't have access to the book, nor am I able to discern from the predict.glm S3 method how these coefficients are handled. Is there a way to reconstruct the location of the summit (around 100 in the example) from the coefficients alone (i.e. without using predict() to find the maximum)?

Upvotes: 4

Views: 533

Answers (2)

mastropi
mastropi

Reputation: 1451

Derivation of the location of the predicted maximum from the theoretical expressions of the orthogonal polynomials

I got a copy of the "Statistical Computing" book by Kennedy and Gentle (1982) referenced in the documentation of poly and now share my findings about the calculation of the orthogonal polynomials, and how we can use them to find the location x of the maximum predicted value.

The orthogonal polynomials presented in the book (pp. 343-4) are monic (i.e. the highest order coefficient is always 1) and are obtained by the following recurrence procedure:
recurrence relation

where q is the number of orthogonal polynomials considered.

Note the following relationship of the above terminology with the documentation of poly:

  1. The "three-term recursion" appearing in the excerpt included in your question is the RHS of the third expression which has precisely three terms.
  2. The rho(j+1) coefficients in the third expression are called "centering constants".
  3. The gamma(j) coefficients in the third expression do not have a name in the documentation but they are directly related to the "normalization constants", as seen below.

For reference, here I paste the relevant excerpt of the "Value" section of the poly documentation:

A matrix with rows corresponding to points in x and columns corresponding to the degree, with attributes "degree" specifying the degrees of the columns and (unless raw = TRUE) "coefs" which contains the centering and normalization constants used in constructing the orthogonal polynomials

Going back to the recurrence, we can derive the values of parameters rho(j+1) and gamma(j) from the third expression by imposing the orthogonality condition on p(j+1) w.r.t. p(j) and p(j-1).
(It's important to note that the orthogonality condition is not an integral, but a summation on the n observed x points, so the polynomial coefficients depend on the data! --which is not the case for instance for the Tchebyshev orthogonal polynomials).

The expressions for the parameters become:
parameters

For the polynomials of orders 1 and 2 used in your regression, we get the following expressions, already written in R code:

# First we define the number of observations in the data
n = length(x)

# For p1(x):
# p1(x) = (x - rho1) p0(x)      (since p_{-1}(x) = 0)
rho1 = mean(x)

# For p2(x)
# p2(x) = (x - rho2) p1(x) - gamma1
gamma1 = var(x) * (n-1)/n
rho2 = sum( x * (x - mean(x))^2 ) / (n*gamma1)

for which we get:

> c(rho1, rho2, gamma1)
[1]  100.50  100.50 3333.25

Note that coefs attribute of poly(x,2) is:

> attr(poly(x,2), "coefs")
$alpha
[1] 100.5 100.5

$norm2
[1]          1        200     666650 1777555560

where $alpha contains the centering constants, i.e. the rho values (which coincide with ours --incidentally all centering constants are equal to the average of x when the distribution of x is symmetric for any q! (observed and proved)), and $norm2 contains the normalization constants (in this case for p(-1,x), p(0,x), p(1,x), and p(2,x)), that is the constants c(j) that normalize the polynomials in the recurrence formula --by dividing them by sqrt(c(j))--, making the resulting polynomials r(j,x) satisfy sum_over_i{ r(j,x_i)^2 } = 1; note that r(j,x) are the polynomials stored in the object returned by poly().

From the expression already given above, we observe that gamma(j) is precisely the ratio of two consecutive normalization constants, namely: gamma(j) = c(j) / c(j-1).

We can check that our gamma1 value coincides with this ratio by computing:

gamma1 == attr(poly(x,2), "coefs")$norm2[3] / attr(poly(x,2), "coefs")$norm2[2]

which returns TRUE.

Going back to your problem of finding the maximum of the values predicted by your model, we can:

  1. Express the predicted value as a function of r(1,x) and r(2,x) and the coefficients from the logistic regression, namely:

    pred(x) = beta0 + beta1 * r(1,x) + beta2 * r(2,x)

  2. Derive the expression w.r.t. x, set it to 0 and solve for x.

In R code:

# Get the normalization constants alpha(j) to obtain r(j,x) from p(j,x) as
# r(j,x) = p(j,x) / sqrt( norm(j) ) = p(j,x) / alpha(j)
alpha1 = sqrt( attr(poly(x,2), "coefs")$norm2[3] )
alpha2 = sqrt( attr(poly(x,2), "coefs")$norm2[4] )

# Get the logistic regression coefficients (beta1 and beta2)
coef1 = as.numeric( model$coeff["poly(x, 2)1"] )
coef2 = as.numeric( model$coeff["poly(x, 2)2"] )

# Compute the x at which the maximum occurs from the expression that is obtained
# by deriving the predicted expression pred(x) = beta0 + beta1*r(1,x) + beta2*r(2,x)
# w.r.t. x and setting the derivative to 0.
xmax = ( alpha2^-1 * coef2 * (rho1 + rho2) - alpha1^-1 * coef1 ) / (2 * alpha2^-1 * coef2)

which gives:

> xmax
[1] 97.501114

i.e. the same value obtained with the other "empirical" method described in my previous answer.


The full code to obtain the location x of the maximum of the predicted values, starting off from the code you provided, is:

# First we define the number of observations in the data
n = length(x)

# Parameters for p1(x):
# p1(x) = (x - rho1) p0(x)      (since p_{-1}(x) = 0)
rho1 = mean(x)

# Parameters for p2(x)
# p2(x) = (x - rho2) p1(x) - gamma1
gamma1 = var(x) * (n-1)/n
rho2 = mean( x * (x - mean(x))^2 ) / gamma1

# Get the normalization constants alpha(j) to obtain r(j,x) from p(j,x) as
# r(j,x) = p(j,x) / sqrt( norm(j) ) = p(j,x) / alpha(j)
alpha1 = sqrt( attr(poly(x,2), "coefs")$norm2[3] )
alpha2 = sqrt( attr(poly(x,2), "coefs")$norm2[4] )

# Get the logistic regression coefficients (beta1 and beta2)
coef1 = as.numeric( model$coeff["poly(x, 2)1"] )
coef2 = as.numeric( model$coeff["poly(x, 2)2"] )

# Compute the x at which the maximum occurs from the expression that is obtained
# by deriving the predicted expression pred(x) = beta0 + beta1*r(1,x) + beta2*r(2,x)
# w.r.t. x and setting the derivative to 0.
( xmax = ( alpha2^-1 * coef2 * (rho1 + rho2) - alpha1^-1 * coef1 ) / (2 * alpha2^-1 * coef2) )

Upvotes: 6

mastropi
mastropi

Reputation: 1451

Assuming you want to find the maximum of the prediction analytically for this particular case where the orthogonal polynomials are of order 1 and 2, I propose the following approach:

SUMMARY

1) Infer the polynomial coefficients

This can easily be done by fitting a linear model to the respective polynomial values contained in the model matrix.

2) Derive the prediction expression w.r.t. x and set the derivative to 0

Solve for x in the prediction expression inferred from the polynomial fit in (1) and obtain the value of x at which the prediction's maximum occurs.

DETAILS

1) Polynomial coefficients

Following from the line where you fit the GLM model, we estimate the coefficients for the polynomial of order 1, p1(x) = a0 + a1*x, and the coefficients for the polynomial of order 2, p2(x) = b0 + b1*x + b2*x^2:

X = model.matrix(model)
p1 = X[, "poly(x, 2)1"]
p2 = X[, "poly(x, 2)2"]

p1.lm = lm(p1 ~ x)
a0 = p1.lm$coeff["(Intercept)"]
a1 = p1.lm$coeff["x"]

p2.lm = lm(p2 ~ x + I(x^2))
b0 = p2.lm$coeff["(Intercept)"]
b1 = p2.lm$coeff["x"]
b2 = p2.lm$coeff["I(x^2)"]

This gives:

> c(a0, a1, b0, b1, b2)
   (Intercept)              x    (Intercept)              x         I(x^2) 
-1.2308840e-01  1.2247602e-03  1.6050353e-01 -4.7674315e-03  2.3718565e-05 

2) Derivative of the prediction to find the maximum

The expression for the prediction, z, (before the inverse link function) is:

z = Intercept + coef1 * p1(x) + coef2 * p2(x)

We derive this expression and set it to 0 to obtain:

coef1 * a1 + coef2 * (b1 + 2 * b2 * xmax) = 0

Solving for xmax we get:

xmax = - (coef1 * a1 + coef2 * b1) / (2 * coef2 * b2)

In R code, this is computed as:

coef1 = as.numeric( model$coeff["poly(x, 2)1"] )
coef2 = as.numeric( model$coeff["poly(x, 2)2"] )
(xmax = - ( coef1 * a1 + coef2 * b1 ) / (2 * coef2 * b2))

which gives:

        x 
97.501114

CHECK

We can verify the maximum by adding it to the prediction's curve as a green cross:

# Prediction curve computed analytically
Intercept = model$coeff["(Intercept)"]
pred.analytical = family$linkinv( Intercept + coef1 * p1 + coef2 * p2 )

# Find the prediction's maximum analytically
pred.max = family$linkinv( Intercept + coef1 * (a0 + a1 * xmax) +
                                       coef2 * (b0 + b1 * xmax + b2 * xmax^2) )


# Plot
plot(x, y)
# The following two lines should coincide!
lines(x, pred.analytical, col = 3)
lines(x, family$linkinv(predict(model)), col = 2)
# Location of the maximum!
points(xmax, pred.max, pch="x", col="green")

which gives:

enter image description here

Upvotes: 3

Related Questions