Reputation: 1992
I cannot estimate a logit model with a factor variable as dependent. I created a reproducible example to explain better and show the error message.
## create a reproducible example that replicates the problem
set.seed(12) # reproducibility of the "randomly" generated data.
df<-data.frame(dummy=as.factor(rep(c("yes","no"),100)), # factor encoding
x=rnorm(n = 200,mean = 5,sd = 1)) # some predictor variable
# calculate regression with different encodings
summary(glm(formula = dummy~x,data = df)) # does not work
error message for the this approach is
Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, :
NA/NaN/Inf in 'y'
In addition: Warning messages:
1: In Ops.factor(y, mu) : ‘-’ not meaningful for factors
2: In Ops.factor(eta, offset) : ‘-’ not meaningful for factors
3: In Ops.factor(y, mu) : ‘-’ not meaningful for factors
I don't quite understand this message. is there anything wrong with data scale (factor) or is it a problem of how I apply the function? any help would be much appreciated.
Upvotes: 0
Views: 245
Reputation: 4314
Add family="binomial"
to specify that this is a logistic regression and it works:
> fit <- (glm(formula = dummy~x, data = df, family="binomial"))
> summary(fit)
Call:
glm(formula = dummy ~ x, family = "binomial", data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.18456 -1.17736 -0.00041 1.17736 1.18342
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.028747 0.734674 0.039 0.969
x -0.005782 0.145003 -0.040 0.968
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 277.26 on 199 degrees of freedom
Residual deviance: 277.26 on 198 degrees of freedom
AIC: 281.26
Number of Fisher Scoring iterations: 3
Upvotes: 1