Reputation: 53
I have a logistic regression with 5 predictors (each with 2 or 3 levels). I obtained the coefficients of the model and now I would like to obtain probabilities. Is there a way to obtain probabilities of all possible predictors' combinations?
I am used to find the probability using P = 1/1+e^-z, with z being the equation of LR. I usually replace the values with the desired profile. E.g. male, age less than 25, worker, higher education, living alone. With 5 predictors (each with 2 or 3 levels), I will have many profiles and probabilities to calculate. I am wondering if there is a way to automate this, especially if there is a different number of predictors. In other words, extract the coefficients from LR, calculate the probabilities associated with different profiles.
The outcome should be like this:
profile | probability |
---|---|
male, age less than 25, worker, higher education, living alone | 0.12 |
male, age more than 25, worker, higher education, partnered | 0.15 |
female, age more than 25, worker, higher education, partnered | 0.13 |
female, age less than 25, worker, highschool, living alone | 0.09 |
male, age less than 25, worker, highschool, living alone | 0.05 |
etc.
I tried doing the following for a hypothetical situation, but this requires writing down the variables in order and detailing the levels, which is error-prone. I am also not sure this is the proper way to do it (kind of new to R, so I am unsure about my code either) since I am not including the intercept.
#saving coefficient in a vector
coef <- c(coef_predictor1, coef_predictor2, coef_predictor3, coef_predictor4, coef_predictor5)
#matrix of predictors levels
predictor_combinations <- expand.grid(
predictor1 = c(0, 1), # Assuming binary predictors
predictor2 = c(0, 1),
predictor3 = c(0, 1),
predictor4 = c(0, 1),
predictor5 = c(0, 1)
)
#different profiles
profiles <- predictor_combinations %*% coef
#calculating probabilities
probabilities <- 1 / (1 + exp(-profiles))
Upvotes: 0
Views: 310
Reputation: 4725
I am going to suppose you are using glm()
for fitting and in that case you can use predict()
to find the probabilities.
First I need to create some example data and fit the model (it would have been easier, had you made your question reproducible):
# use two Iris species as response
dat <- subset(iris, Species != 'setosa')
# make variables categorical
dat$Species <- factor(dat$Species)
for (var in setdiff(names(dat), 'Species')) {
dat[[paste0(var, '_f')]] <- cut(dat[[var]], c(0, quantile(dat[[var]], 1:3 / 3)))
}
m <- glm(Species ~ Sepal.Length_f + Petal.Length_f, dat, family=binomial)
The rest is rather straightforward: I use model.frame()
to extract data used by the model and generate predictor level combinations ([, -1]
to exclude the response).
pred_combinations <- expand.grid(lapply(model.frame(m)[, -1], levels))
probs <- predict(m, newdata = pred_combinations, type = 'response')
data.frame(profile = apply(pred_combinations, 1, paste, collapse=', '),
probability = probs)
# profile probability
# 1 (0,6], (0,4.5] 0.032403657
# 2 (6,6.5], (0,4.5] 0.009159104
# 3 (6.5,7.9], (0,4.5] 0.002669656
# 4 (0,6], (4.5,5.2] 0.806029394
# 5 (6,6.5], (4.5,5.2] 0.534234260
# 6 (6.5,7.9], (4.5,5.2] 0.249332586
# 7 (0,6], (5.2,6.9] 1.000000000
# 8 (6,6.5], (5.2,6.9] 0.999999999
# 9 (6.5,7.9], (5.2,6.9] 0.999999995
Upvotes: 0