Negative binomial glm.nb : Different results with unbalanced variable and complete separation

Question

I am fitting a negative binomial model using glm.nb from the MASS package of R with as output Y depending on variable y1, x, and z . The problem is that I get different estimated coefficients using the same model but with the following equivalent formulas: Y ~ y1 + x * z or Y ~ x * z + y1. (reversing the order)

Below are my data and the results with the two formulas:

z <-c("T3", "T3", "T3", "T2", "T2", "T3", "T3", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T2", "T1", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T2", "T3", "T2", "T3", "T2", "T2", "T2", "T2", "T3", "T3", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T3", "T3", "T3", "T2", "T1", "T3", "T3", "T3", "T3", "T2", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T3", "T2", "T3", "T2", "T3", "T3", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T3", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T2", "T2", "T3", "T2", "T3", "T3", "T2", "T2")
x <- c("B-", "B-", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B+", "B-", "B+", "B+", "B-", "B-", "B+", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B+", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B+", "B-", "B+", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B-", "B+", "B+", "B-", "B-", NA )
Y <- c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 13, 19, 1, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 1, 0, 2, 0, 0, 6, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 10, 0, 0, 0, 0, 0, 0, 3)
y1 <- c(1, 3, 0, 1, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 1, 0, 5, 0, 2, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 1, 12, 1, 0, 0, 0, 2, 0, 4, 0, 0, 0, 0, 4, 0, 0, 0, 2, 2, 0, 0, 1, 0, 0, 16, 0, 14, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 6, 0, 0, 1, 0, 0, 1, 13)

data1 <- data.frame(Y, y1, z, x)
glm.nb(Y ~  x * z + y1, data = data1)

glm.nb(Y ~  y1 + x * z, data = data1)

> glm.nb(Y ~  x * z + y1, data = data1)

Call:  glm.nb(formula = Y ~ x * z + y1, data = data1, init.theta = 0.2911311528, 
    link = log)

Coefficients:
(Intercept)           x1           z1           z2           y1        x1:z1        x1:z2  
  -13.27592     -0.06054    -24.03075     11.75385      0.38529      0.54648     -0.11488  

Degrees of Freedom: 112 Total (i.e. Null);  106 Residual
  (1 observation deleted due to missingness)
Null Deviance:      94.45 
Residual Deviance: 61.74    AIC: 199.8

> glm.nb(Y ~  y1 + x * z, data = data1)

Call:  glm.nb(formula = Y ~ y1 + x * z, data = data1, init.theta = 0.2911311528, 
    link = log)

Coefficients:
(Intercept)           y1           x1           z1           z2        x1:z1        x1:z2  
   -13.4950       0.3853      -0.7217     -24.4689      11.9729      -0.7759       0.5463  

Degrees of Freedom: 112 Total (i.e. Null);  106 Residual
  (1 observation deleted due to missingness)
Null Deviance:      94.45 
Residual Deviance: 61.74    AIC: 199.8

I get different values of coefficients between the two formulas. Knowing that the variable z has only two observations with the T1 modality, can this influence the estimation of my coefficients?

Is it possible to have identical results with the two formulas Y ~ y1 + x * z and Y ~ x * z + y1 even if there is a convergence problem?

Thank you for your answers and your help.

Negative binomial glm.nb : Different results with unbalanced variable and complete separation

Answers (1)

Related Questions