joaoal
joaoal

Reputation: 1992

Estimate logit models for factor variable produces error

I cannot estimate a logit model with a factor variable as dependent. I created a reproducible example to explain better and show the error message.

## create a reproducible example that replicates the problem
set.seed(12) # reproducibility of the "randomly" generated data. 
df<-data.frame(dummy=as.factor(rep(c("yes","no"),100)), # factor encoding
               x=rnorm(n = 200,mean = 5,sd = 1)) # some predictor variable


# calculate regression with different encodings
summary(glm(formula = dummy~x,data = df)) # does not work

error message for the this approach is

Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  : 
  NA/NaN/Inf in 'y'
    In addition: Warning messages:
1: In Ops.factor(y, mu) : ‘-’ not meaningful for factors
2: In Ops.factor(eta, offset) : ‘-’ not meaningful for factors
3: In Ops.factor(y, mu) : ‘-’ not meaningful for factors

I don't quite understand this message. is there anything wrong with data scale (factor) or is it a problem of how I apply the function? any help would be much appreciated.

Upvotes: 0

Views: 245

Answers (1)

mysteRious
mysteRious

Reputation: 4314

Add family="binomial" to specify that this is a logistic regression and it works:

> fit <- (glm(formula = dummy~x, data = df, family="binomial")) 
> summary(fit)

Call:
glm(formula = dummy ~ x, family = "binomial", data = df)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.18456  -1.17736  -0.00041   1.17736   1.18342  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.028747   0.734674   0.039    0.969
x           -0.005782   0.145003  -0.040    0.968

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 277.26  on 199  degrees of freedom
Residual deviance: 277.26  on 198  degrees of freedom
AIC: 281.26

Number of Fisher Scoring iterations: 3

Upvotes: 1

Related Questions