Thundersheep
Thundersheep

Reputation: 45

Error in GLM fitting

I have a good working GLM and have certain variables as "surface" and "price", these are numeric. I like to add them as a log variant to my model.

In order to do so I did the follows;

data$logprice<-log(data$price)

Then i added to my model as follows;

model <- glm(variableA ~ logprice + variableB +variableC , binomial)

And just when i added the log to it i get the following error;

Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  : 

NA/NaN/Inf in 'x'

Hope you can help me explain this error, or guide me in how to fix it. Thanks in advance!

Upvotes: 2

Views: 12491

Answers (1)

Hack-R
Hack-R

Reputation: 23231

You didn't provide your data or runnable code, so it's impossible to say what it was that caused the error in your case. However, I have a pretty good idea.

I can show you is that in general this is not the case:

data(iris)

iris$logprice  <- log(iris$Sepal.Length)
iris$variableA <- ifelse(iris$Species=="setosa",1,0)

model <- glm(variableA ~ logprice, binomial, data = iris)
summary(model)
Call:
glm(formula = variableA ~ logprice, family = binomial, data = iris)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.28282  -0.29561  -0.06431   0.29645   2.13240  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   46.767      7.978   5.862 4.58e-09 ***
logprice     -27.836      4.729  -5.887 3.94e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 190.954  on 149  degrees of freedom
Residual deviance:  72.421  on 148  degrees of freedom
AIC: 76.421

Number of Fisher Scoring iterations: 7

However, let's say you have a value like 0 which cannot survive log transformation without being infinite:

iris$Sepal.Length[1] <- 0
iris$logprice  <- log(iris$Sepal.Length)
iris$variableA <- ifelse(iris$Species=="setosa",1,0)

model <- glm(variableA ~ logprice, binomial, data = iris)
Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  : 
  NA/NaN/Inf in 'x'

Why? Because:

> log(0)
[1] -Inf

One solution (which is kind of a hack) is to add a tiny bit of jitter, or simply replace 0 with some infinitesimally small value. However, if that makes good statistical and research sense is beyond the scope of this answer.

If you have any NA values you can also drop or impute those.

Upvotes: 6

Related Questions