Reputation: 155
This might be a trivial question but I don't know where to find answers. I'm wondering when using glm()
for logistic regression in R, if the response variable Y
has factor values 1 or 2, does the result of glm()
correspond to logit(P(Y=1))
or logit(P(Y=2))
? What if Y
has logical values TRUE
or FALSE
?
Upvotes: 2
Views: 870
Reputation: 226971
Testing is good. If you want the documentation, it's in ?binomial
(which is the same as ?family
):
For the ‘binomial’ and ‘quasibinomial’ families the response can be specified in one of three ways:
- As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
- As a numerical vector with values between ‘0’ and ‘1’, interpreted as the proportion of successful cases (with the total number of cases given by the ‘weights’).
- As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.
It doesn't explicitly say what happens in the logical (TRUE
/FALSE
) case; for that you have to know that, when coercing logical to numeric values, FALSE
→ 0 and TRUE
→ 1.
Upvotes: 3
Reputation: 174586
Why not just test it yourself?
output_bool <- c(rep(c(TRUE, FALSE), c(25, 75)), rep(c(TRUE, FALSE), c(75, 25)))
output_num <- c(rep(c(2, 1), c(25, 75)), rep(c(2, 1), c(75, 25)))
output_fact <- factor(output_num)
var <- rep(c("unlikely", "likely"), each = 100)
glm(output_bool ~ var, binomial)
#>
#> Call: glm(formula = output_bool ~ var, family = binomial)
#>
#> Coefficients:
#> (Intercept) varunlikely
#> 1.099 -2.197
#>
#> Degrees of Freedom: 199 Total (i.e. Null); 198 Residual
#> Null Deviance: 277.3
#> Residual Deviance: 224.9 AIC: 228.9
glm(output_num ~ var, binomial)
#> Error in eval(family$initialize): y values must be 0 <= y <= 1
glm(output_fact ~ var, binomial)
#>
#> Call: glm(formula = output_fact ~ var, family = binomial)
#>
#> Coefficients:
#> (Intercept) varunlikely
#> 1.099 -2.197
#>
#> Degrees of Freedom: 199 Total (i.e. Null); 198 Residual
#> Null Deviance: 277.3
#> Residual Deviance: 224.9 AIC: 228.9
So, we get the correct answer if we use TRUE and FALSE, an error if we use 1 and 2 as numbers, and the correct result if we use 1 and 2 as a factor with two levels provided the TRUE value has a higher factor level than the FALSE. However, we have to be careful in how our factors are ordered or we will get the wrong result:
output_fact <- factor(output_fact, levels = c("2", "1"))
glm(output_fact ~ var, binomial)
#>
#> Call: glm(formula = output_fact ~ var, family = binomial)
#>
#> Coefficients:
#> (Intercept) varunlikely
#> -1.099 2.197
#>
#> Degrees of Freedom: 199 Total (i.e. Null); 198 Residual
#> Null Deviance: 277.3
#> Residual Deviance: 224.9 AIC: 228.9
(Notice the intercept and coefficient have flipped signs)
Created on 2020-06-21 by the reprex package (v0.3.0)
Upvotes: 3