Reputation: 45
I have a good working GLM and have certain variables as "surface" and "price", these are numeric. I like to add them as a log variant to my model.
In order to do so I did the follows;
data$logprice<-log(data$price)
Then i added to my model as follows;
model <- glm(variableA ~ logprice + variableB +variableC , binomial)
And just when i added the log to it i get the following error;
Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, :
NA/NaN/Inf in 'x'
Hope you can help me explain this error, or guide me in how to fix it. Thanks in advance!
Upvotes: 2
Views: 12491
Reputation: 23231
You didn't provide your data or runnable code, so it's impossible to say what it was that caused the error in your case. However, I have a pretty good idea.
I can show you is that in general this is not the case:
data(iris)
iris$logprice <- log(iris$Sepal.Length)
iris$variableA <- ifelse(iris$Species=="setosa",1,0)
model <- glm(variableA ~ logprice, binomial, data = iris)
summary(model)
Call: glm(formula = variableA ~ logprice, family = binomial, data = iris) Deviance Residuals: Min 1Q Median 3Q Max -2.28282 -0.29561 -0.06431 0.29645 2.13240 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 46.767 7.978 5.862 4.58e-09 *** logprice -27.836 4.729 -5.887 3.94e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 190.954 on 149 degrees of freedom Residual deviance: 72.421 on 148 degrees of freedom AIC: 76.421 Number of Fisher Scoring iterations: 7
However, let's say you have a value like 0 which cannot survive log transformation without being infinite:
iris$Sepal.Length[1] <- 0
iris$logprice <- log(iris$Sepal.Length)
iris$variableA <- ifelse(iris$Species=="setosa",1,0)
model <- glm(variableA ~ logprice, binomial, data = iris)
Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : NA/NaN/Inf in 'x'
Why? Because:
> log(0)
[1] -Inf
One solution (which is kind of a hack) is to add a tiny bit of jitter, or simply replace 0 with some infinitesimally small value. However, if that makes good statistical and research sense is beyond the scope of this answer.
If you have any NA values you can also drop or impute those.
Upvotes: 6