Reputation: 8041
You'll find a manual implementation of logistic regression in Excel at: http://blog.excelmasterseries.com/2014/06/logistic-regression-performed-in-excel.html.
This implementation uses the dataset below and reports the following coefficients
b0 = 12.48285608
b1 = -0.117031374
b2 = -1.469140055
However, when I analyze the same dataset with glm()
in R, the results are not the same, i.e.:
b0 = 1.687445
b1 = -0.012525
b2 = -0.116473
d <- structure(list(Y = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), X1 = c(78L, 73L, 73L,
71L, 68L, 59L, 57L, 49L, 35L, 27L, 59L, 57L, 44L, 38L, 36L, 36L,
22L, 22L, 15L, 10L), X2 = c(8L, 8L, 5L, 7L, 5L, 4L, 7L, 5L, 4L,
7L, 3L, 4L, 5L, 5L, 4L, 2L, 6L, 5L, 4L, 6L)), .Names = c("Y",
"X1", "X2"), class = "data.frame", row.names = c(NA, -20L))
summary(glm(Y ~ X1+X2, data=d), family=binomial(link='logit'))
# > summary(glm(Y ~ X1+X2, data=d), family=binomial(link='logit'))
#
# Call:
# glm(formula = Y ~ X1 + X2, data = d)
#
# Deviance Residuals:
# Min 1Q Median 3Q Max
# -0.78318 -0.20641 0.07689 0.24375 0.49237
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1.687445 0.319872 5.275 6.18e-05 ***
# X1 -0.012525 0.004376 -2.862 0.0108 *
# X2 -0.116473 0.056959 -2.045 0.0567 .
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# (Dispersion parameter for gaussian family taken to be 0.146843)
#
# Null deviance: 5.0000 on 19 degrees of freedom
# Residual deviance: 2.4963 on 17 degrees of freedom
# AIC: 23.139
#
# Number of Fisher Scoring iterations: 2
Why do the results differ?
Upvotes: 1
Views: 253
Reputation: 206616
You have the family parameter in the wrong place. It should be in the glm()
call, not the summary()
call.
summary(glm(Y ~ X1+X2, data=d, family=binomial(link='logit')))
If you don't include the family in the glm()
, it will do a gaussian (linear) regression.
Upvotes: 5