J L
J L

Reputation: 13

error with integer variable in glm.fit in r

Result=glm(dep~re+ind1+ind2+ind3, data=ds,family=binomial)
Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred 

re is an integer variable that takes a value from 0 to 6, and if I exclude this I don't get the error above anymore.
The following are the data types of the variable re fyi. Can anyone help me figure this out?

> typeof(ds$re)
[1] "integer"
> class(ds$re)
[1] "integer"
> is.numeric(ds$re)
[1] TRUE

> summary(ds$re)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   2.000   2.189   4.000   6.000 

Upvotes: 0

Views: 616

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 174556

The problem is not that re is an integer. The message tells us that re causes complete separation of your data. In other words, dep is completely predictable from re, so some of the probabilities are 0 or 1.

Although we don't have your data, we can construct a dataset which produces the same error using exactly your code simply by making dep completely dependent on re:

set.seed(1)
ds <- data.frame(re = as.integer(sample(0:6, 20, TRUE)),
                 ind1 = rnorm(20), ind2 = rnorm(20), ind3 = rnorm(20))
ds$dep <- ifelse(ds$re > 3, 1, 0)

If we run your code on this, we get:

Result = glm(dep~re+ind1+ind2+ind3, data=ds,family=binomial)
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

If we look at the predicted probabilities, we will see that, within one part in a billion, the probabilities are either 0 or 1:

round(predict(Result, type = "response"), 9)
#>  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
#>  0  0  1  0  0  1  1  0  1  0  0  0  0  1  1  0  1  1  0  1 

The fix for this depends very much on context and your actual data, neither of which are included in the question. For example, with my made-up data set, we could just throw out the model and say that we can predict dep perfectly using re without the other variables. Your own data might have other problems.

Created on 2022-08-09 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions