Uknowepi
Uknowepi

Reputation: 53

Obtain all possible probabilities from different variables combinations from LR

I have a logistic regression with 5 predictors (each with 2 or 3 levels). I obtained the coefficients of the model and now I would like to obtain probabilities. Is there a way to obtain probabilities of all possible predictors' combinations?

I am used to find the probability using P = 1/1+e^-z, with z being the equation of LR. I usually replace the values with the desired profile. E.g. male, age less than 25, worker, higher education, living alone. With 5 predictors (each with 2 or 3 levels), I will have many profiles and probabilities to calculate. I am wondering if there is a way to automate this, especially if there is a different number of predictors. In other words, extract the coefficients from LR, calculate the probabilities associated with different profiles.

The outcome should be like this:

profile probability
male, age less than 25, worker, higher education, living alone 0.12
male, age more than 25, worker, higher education, partnered 0.15
female, age more than 25, worker, higher education, partnered 0.13
female, age less than 25, worker, highschool, living alone 0.09
male, age less than 25, worker, highschool, living alone 0.05

etc.

I tried doing the following for a hypothetical situation, but this requires writing down the variables in order and detailing the levels, which is error-prone. I am also not sure this is the proper way to do it (kind of new to R, so I am unsure about my code either) since I am not including the intercept.

#saving coefficient in a vector
coef <- c(coef_predictor1, coef_predictor2, coef_predictor3, coef_predictor4, coef_predictor5)

#matrix of predictors levels
predictor_combinations <- expand.grid(
  predictor1 = c(0, 1),  # Assuming binary predictors
  predictor2 = c(0, 1),
  predictor3 = c(0, 1),
  predictor4 = c(0, 1),
  predictor5 = c(0, 1)
)

#different profiles
profiles <- predictor_combinations %*% coef

#calculating probabilities 
probabilities <- 1 / (1 + exp(-profiles))

Upvotes: 0

Views: 310

Answers (1)

Robert Hacken
Robert Hacken

Reputation: 4725

I am going to suppose you are using glm() for fitting and in that case you can use predict() to find the probabilities.

First I need to create some example data and fit the model (it would have been easier, had you made your question reproducible):

# use two Iris species as response
dat <- subset(iris, Species != 'setosa')
# make variables categorical
dat$Species <- factor(dat$Species)
for (var in setdiff(names(dat), 'Species')) {
  dat[[paste0(var, '_f')]] <- cut(dat[[var]], c(0, quantile(dat[[var]], 1:3 / 3)))
}

m <- glm(Species ~ Sepal.Length_f + Petal.Length_f, dat, family=binomial)

The rest is rather straightforward: I use model.frame() to extract data used by the model and generate predictor level combinations ([, -1] to exclude the response).

pred_combinations <- expand.grid(lapply(model.frame(m)[, -1], levels))
probs <- predict(m, newdata = pred_combinations, type = 'response')
data.frame(profile = apply(pred_combinations, 1, paste, collapse=', '),
           probability = probs)
#                profile probability
# 1       (0,6], (0,4.5] 0.032403657
# 2     (6,6.5], (0,4.5] 0.009159104
# 3   (6.5,7.9], (0,4.5] 0.002669656
# 4     (0,6], (4.5,5.2] 0.806029394
# 5   (6,6.5], (4.5,5.2] 0.534234260
# 6 (6.5,7.9], (4.5,5.2] 0.249332586
# 7     (0,6], (5.2,6.9] 1.000000000
# 8   (6,6.5], (5.2,6.9] 0.999999999
# 9 (6.5,7.9], (5.2,6.9] 0.999999995

Upvotes: 0

Related Questions