Reputation: 41
I am puzzling over an issue with glm where interaction terms don't show up in the estimation results unless they are entered as separate variables. I was not able to reproduce the problem using the mtcars data set, but there is nothing unusual about my data.
The first three models below give the same results, while the fourth gives the expected result: Code:
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original*pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original**pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
df <- df %>% mutate(squared = pop1_3km_original*pop1_3km_original, .after=pop1_3km_original)
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original+ squared, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
Output:
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"),
data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.784 -3.079 -2.257 2.729 8.409
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.33000 0.01479 22.31 <2e-16 ***
wgrp1 -0.77032 0.01717 -44.87 <2e-16 ***
pop1_3km_original -0.44447 0.01958 -22.70 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29474 on 2325 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29642
Number of Fisher Scoring iterations: 5
>
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original*pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original * pop1_3km_original,
family = binomial(link = "probit"), data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.784 -3.079 -2.257 2.729 8.409
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.33000 0.01479 22.31 <2e-16 ***
wgrp1 -0.77032 0.01717 -44.87 <2e-16 ***
pop1_3km_original -0.44447 0.01958 -22.70 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29474 on 2325 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29642
Number of Fisher Scoring iterations: 5
>
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original**pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Error in terms.formula(formula, data = data) : invalid power in formula
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original * pop1_3km_original,
family = binomial(link = "probit"), data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.784 -3.079 -2.257 2.729 8.409
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.33000 0.01479 22.31 <2e-16 ***
wgrp1 -0.77032 0.01717 -44.87 <2e-16 ***
pop1_3km_original -0.44447 0.01958 -22.70 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29474 on 2325 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29642
Number of Fisher Scoring iterations: 5
>
> df <- df %>% mutate(squared = pop1_3km_original*pop1_3km_original, .after=pop1_3km_original)
>
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original+ squared, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original + squared,
family = binomial(link = "probit"), data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.839 -3.056 -2.288 2.755 8.401
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.31527 0.01689 18.669 < 2e-16 ***
wgrp1 -0.76865 0.01720 -44.702 < 2e-16 ***
pop1_3km_original -0.35047 0.05525 -6.344 2.24e-10 ***
squared -0.07031 0.03878 -1.813 0.0698 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29471 on 2324 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29641
Number of Fisher Scoring iterations: 5
Upvotes: 1
Views: 691
Reputation: 72828
This type of interaction is actually a polynomial, and *
won't work here. You may want to add the squared term, but use the I()
function to prevent glm
from interpreting it as arithmetic operation.
glm(am ~ mpg*mpg, family=binomial(link='probit'), mtcars)$coe ## won't work
# (Intercept) mpg
# -3.9102065 0.1822684
glm(am ~ mpg + I(mpg^2), family=binomial(link='probit'), mtcars)$coe ## works
# (Intercept) mpg I(mpg^2)
# -1.504927683 -0.064500921 0.006066692
Or use poly(..., raw=TRUE)
glm(am ~ poly(mpg, 2, raw=TRUE), family=binomial(link='probit'), mtcars)$coe
# (Intercept) poly(mpg, 2, raw = TRUE)1 poly(mpg, 2, raw = TRUE)2
# -1.504927683 -0.064500921 0.006066692
Or calculate the term beforehand.
glm(am ~ mpg + mpgsq, family=binomial(link='probit'), transform(mtcars, mpgsq=mpg^2))$coe
# (Intercept) mpg mpgsq
# -1.504927683 -0.064500921 0.006066692
Upvotes: 2