Reputation: 13
Result=glm(dep~re+ind1+ind2+ind3, data=ds,family=binomial)
Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
re is an integer variable that takes a value from 0 to 6, and if I exclude this I don't get the error above anymore.
The following are the data types of the variable re fyi. Can anyone help me figure this out?
> typeof(ds$re)
[1] "integer"
> class(ds$re)
[1] "integer"
> is.numeric(ds$re)
[1] TRUE
> summary(ds$re)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 2.000 2.189 4.000 6.000
Upvotes: 0
Views: 616
Reputation: 174556
The problem is not that re
is an integer. The message tells us that re
causes complete separation of your data. In other words, dep
is completely predictable from re
, so some of the probabilities are 0 or 1.
Although we don't have your data, we can construct a dataset which produces the same error using exactly your code simply by making dep
completely dependent on re
:
set.seed(1)
ds <- data.frame(re = as.integer(sample(0:6, 20, TRUE)),
ind1 = rnorm(20), ind2 = rnorm(20), ind3 = rnorm(20))
ds$dep <- ifelse(ds$re > 3, 1, 0)
If we run your code on this, we get:
Result = glm(dep~re+ind1+ind2+ind3, data=ds,family=binomial)
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
If we look at the predicted probabilities, we will see that, within one part in a billion, the probabilities are either 0 or 1:
round(predict(Result, type = "response"), 9)
#> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
#> 0 0 1 0 0 1 1 0 1 0 0 0 0 1 1 0 1 1 0 1
The fix for this depends very much on context and your actual data, neither of which are included in the question. For example, with my made-up data set, we could just throw out the model and say that we can predict dep
perfectly using re
without the other variables. Your own data might have other problems.
Created on 2022-08-09 by the reprex package (v2.0.1)
Upvotes: 1