Reputation: 1848
I'm running a glm in r on a dataframe with 2 values.
str(INV)
'data.frame': 5614 obs. of 2 variables:
$ MSACode: Factor w/ 70 levels "40","80","440",..: 37 64 58 56 66 14 38 37 66 14 ...
$ NotPaid: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
The code I used to run it:
GlmModel <- glm(NotPaid ~ MSACode,family=binomial(link="logit"),data=training)
print(summary(GlmModel))
The result from the summary is showing the individual values rather than just one value for the field.
> print(summary(GlmModel))
Call:
glm(formula = NotPaid ~ MSACode, family = binomial(link = "logit"),
data = training)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9728 -0.8352 -0.6501 0.9346 2.8245
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.657e+01 1.697e+03 -0.010 0.992
MSACode80 1.462e+01 1.697e+03 0.009 0.993
MSACode440 -7.494e-07 1.924e+03 0.000 1.000
MSACode520 1.547e+01 1.697e+03 0.009 0.993
MSACode640 1.587e+01 1.697e+03 0.009 0.993
MSACode720 1.477e+01 1.697e+03 0.009 0.993
MSACode870 1.657e+01 1.697e+03 0.010 0.992
MSACode1080 1.455e+01 1.697e+03 0.009 0.993
I don't understand these results - why is it showing each MSACode value separately? Thanks.
Upvotes: 1
Views: 528
Reputation: 226532
I'm sure this is a duplicate, but can't find it.
The problem is that, because MSACode
is a factor (possibly because of a value in that column of an input file that couldn't be interpreted as numeric), R is assuming you want to treat it as a categorical rather than as a continuous predictor — hence, it gives you n-1
parameters (where n
is the number of levels) rather than 1 to describe its effect. You can convert it back to numeric by:
INV <- transform(INV,
MSACode = as.numeric(as.character(MSACode)))
and then re-run your model. (This post explains why we need as.numeric(as.character(.))
rather than as.numeric()
, and explains that as.numeric(levels(f))[f]
is more efficient — although I rarely bother worrying about that level of efficiency ...)
Upvotes: 2