How come I get this logistic regression error in glm/glm2 if I don't exhibit linear separation in my data?

Question

I started running into the error (converted from warning):

glm.fit (or glm.fit2): fitted probabilities numerically 0 or 1 occurred

I found this link referencing linear separation of data:

[R] glm.fit: "fitted probabilities numerically 0 or 1 occurr

So I tried hunting through the data and found a small reproducible example from a small subset of the data (both glm and glm2) where I don't actually see the linear separation and yet I get the error:

response = c(0,1,0,1,0,0,0,0,0,0)
dependent = c(133,571,1401,4930,3134075,44357054,1718619387,1884020779,8970035092,9392823637)
foo = data.frame(y=response,x=dependent)
glm(y ~ x, family=binomial, data=foo)

I can avoid the issue by transforming the dependent via log(x+1), however, this is monotonic and doesn't alter the ordering so I'm not sure why that helps and whether I should be doing so. The dependents are "microseconds since the last time some event happened" which is why some values can be large. I tried turning it into a two level factor of (recent, not recent) but that loses information and underperforms the raw values.

mlegge · Accepted Answer

I think this is just a feature of the data and the rounding of the floating point calculations going on in the optimization of the maximum likelihood function.

Take a look at the fitted values of the log transformed set:

> response = c(0,1,0,1,0,0,0,0,0,0)
> dependent = c(133,571,1401,4930,3134075,44357054,1718619387,1884020779,8970035092,9392823637)
> 
> foo = data.frame(y=response,x=log(dependent))
> mlog <- glm(y ~ x, family=binomial, data=foo)
> mlog$fitted
          1           2           3           4 
0.584089292 0.484155299 0.422713978 0.340825478 
          5           6           7           8 
0.079815887 0.040011202 0.014931996 0.014562755 
          9          10 
0.009506656 0.009387457

Whereas the untransformed set results in the occurance miniscule fitted values:

> foo = data.frame(y=response,x=dependent)
> m <- glm(y ~ x, family=binomial, data=foo)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 
> m$fitted.values
           1            2            3 
5.007959e-01 5.005387e-01 5.000511e-01 
           4            5            6 
4.979784e-01 6.359085e-04 2.220446e-16 
           7            8            9 
2.220446e-16 2.220446e-16 2.220446e-16 
          10 
2.220446e-16

Doesn't seem to be a warning related to complete (or quasi) separation of the data. I think the warning is pretty informative in this case.

How come I get this logistic regression error in glm/glm2 if I don't exhibit linear separation in my data?

Answers (2)

Related Questions

How come I get this logistic regression error in glm/glm2 if I don&#39;t exhibit linear separation in my data?

Answers (2)

Related Questions

How come I get this logistic regression error in glm/glm2 if I don't exhibit linear separation in my data?