NewBee
NewBee

Reputation: 1040

Convert glmer output (logit regression) to probabilities

I have data such as this. I am running glm on all the Q variables.

dat <- read_table2("condition   school  Q5_3    Q6  Q7_1    Q7_2    Q7_3    Q7_4    Q13_1   Q13_2   Q13_3
0   A   1   1   1   1   1   1   0   1   1
1   B   1   0   0   NA  NA  NA  NA  1   1
1   C   1   0   1   1   1   1   0   1   1
1   A   0   0   0   NA  NA  NA  NA  1   1
1   B   1   0   0   NA  NA  NA  NA  1   1
0   C   1   1   1   1   1   0   0   0   0
0   A   0   0   0   NA  NA  NA  NA  NA  NA
0   B   1   1   1   1   1   1   1   1   1
0   C   1   1   0   NA  NA  NA  NA  1   0
0   A   1   0   0   NA  NA  NA  NA  1   0
0   B   1   0   1   1   0   1   1   NA  NA
0   C   1   0   1   1   1   1   1   1   0
1   A   1   1   1   1   0   1   0   1   1
1   B   0   0   0   NA  NA  NA  NA  1   1
0   C   1   0   0   NA  NA  NA  NA  NA  NA
")

This is the loop that I am using to pull out the coefficients that I want.

# We only need the condition and school
# Apply
models <- function(x)
{
  model1 <- glmer(x~ (1|school) + condition ,data=dat , family = binomial, na.action = na.exclude)
  return(model1)
}


y <- apply(dat[,-c(1,2)],2,models)
#Extract results
extract <- function(x)
{
  z <- as.data.frame(summary(x)$coefficient)
  z$id <- rownames(z)
  z <- z[,c(dim(z)[2],1:(dim(z)[2]-1))]
  rownames(z)<-NULL
  return(z)
}
#Extract summary with function
DF <- as.data.frame(do.call(rbind,lapply(y,extract)))
#Format variables
DF$var <- gsub("\\..*","",rownames(DF))
#Arrange columns
DF_glm <- DF[,c(dim(DF)[2],1:(dim(DF)[2]-1))]
rownames(DF)<-NULL

This loops works fine, but I need to convert the output (coefficients) from log odds to probabilities. Any suggestions on how to do this?

Upvotes: 2

Views: 2718

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226731

Bad news: there's not really any sensible way to convert coefficients of a logistic regression (which are on the log-odds-ratio or logit scale) to a probability scale. The conversion from log-odds to probabilities depends on the baseline level, so to get probabilities you would have to make predictions of probabilities for specific cases: see e.g. this CrossValidated question.

Good news: exponentiating the coefficients gives you odds ratios, which are generally well understood and arguably easier to understand than the log-odds-ratio.

library(broom.mixed)
dd <- dat[,-c(1,2)]
## find (and drop) examples with no variation
uu <- apply(dd,2,function(x) length(unique(na.omit(x))))
modList <- apply(dd[,uu>1],2,models)
## generate list of models
purrr:::map_dfr(modList,tidy,
        effect="fixed",
        exponentiate=TRUE,.id="Q")

This gives you a table (tibble) with estimates on the odds ratio scale, standard errors, p-values etc. There are other options such as conf.int=TRUE if you want confidence intervals in the table. You can operate it with any of the tidyverse tools (e.g. %>% filter(term=="condition") if you're not interested in the intercepts).

Many of the answers in this example are kind of bogus, but that's because your data set is too small ... I hope your real data set is bigger than this ...


Explaining why you can't generally convert odds ratios to probabilities (without specifying a baseline) is really more of a statistical/CrossValidated question, but I'll give a short example based on the UCLA stats site

  • Import data: scale the predictor variables for GRE and GPA to get more interpretable parameter values.
library(tidyverse)
dd <- (haven::read_dta("https://stats.idre.ucla.edu/stat/stata/dae/binary.dta")
    %>% mutate_at(c("gre","gpa"), ~drop(scale(.)))
)
  • Fit the model and extract coefficients
m <- glm(admit~gre+gpa, family=binomial, dd)
cc <- coef(m)
## (Intercept)         gre         gpa 
##  -0.8097503   0.3108184   0.2872088
  • transforming:

plogis() is the built-in R function for the inverse logit (logistic) transformation.

Transforming the intercept parameter does make sense: it gives the predicted probability for an individual with baseline characteristics; since we have centered the predictors, this corresponds to an individual with the population mean GPA and GRE.

int_prob <- plogis(cc["(Intercept)"])  ## 0.307

We could also predict the probability for an individual with the mean GRE and a GPA one standard deviation above the mean (the units of the GPA parameter are "per standard deviation" because we scaled the GPA variable by its standard deviation):

gre_prob <- with(as.list(cc), plogis(`(Intercept)`+gre)) ## 0.3777

We could calculate the difference between these predictions, which is one way of specifying the effect of GRE on the probability scale:

gre_prob-int_prob ## 0.0698

However, it only applies for this particular comparison (an individual with mean GPA and GRE 1 SD above the mean compared to an individual with the mean GPA and GRE). The change in probability per unit GRE would be different if we started from a different baseline or made the prediction for a different GRE change.

You can logistic-transform the GRE coefficient if you want:

plogis(cc["gre"])  ## 0.577

What does this mean, though? It is the probability of success for an individual with a baseline log-odds of zero (which is not the individual with the average GPA and GRE) if you were then to increase their GRE by 1 standard deviation. Not something that's easy to explain ...

There are other rules of thumb/approximations for understanding the meaning of log-odds-ratios, e.g. the divide by 4 rule, but they all depend in some way on specifying a baseline level.

Upvotes: 4

Duck
Duck

Reputation: 39613

You can try this. You will get warnings because of your data:

library(lme4)
#Preprocess data
#If you omit NA variables will be constants so that the model
#can no be fitted and will produce error
#I set NA to zero in order to get models working
#Please check your data
dat[is.na(dat)] <- 0
# We only need the condition and school
# Apply
models <- function(x)
{
  model1 <- glmer(x~ (1|school) + condition ,data=dat , family = binomial, na.action = na.exclude)
  return(model1)
}

y <- apply(dat[,-c(1,2)],2,models)
#Extract results and we will extract the logs
extract <- function(x)
{
  z <- as.data.frame(summary(x)$coefficient)
  z$id <- rownames(z)
  z <- z[,c(dim(z)[2],1:(dim(z)[2]-1))]
  z$odds <- exp(z$Estimate)
  z$prob <- z$odds / (1 + z$odds)
  rownames(z)<-NULL
  return(z)
}

#Extract summary with function
DF <- as.data.frame(do.call(rbind,lapply(y,extract)))
#Format variables
DF$var <- gsub("\\..*","",rownames(DF))
#Arrange columns
DF_glm <- DF[,c(dim(DF)[2],1:(dim(DF)[2]-1))]
rownames(DF)<-NULL

You will get this:

            id      Estimate   Std. Error       z value   Pr(>|z|)         odds
1  (Intercept)  2.079442e+00 1.060660e+00  1.960517e+00 0.04993534 8.000000e+00
2    condition -1.386294e+00 1.369306e+00 -1.012407e+00 0.31134363 2.500000e-01
3  (Intercept) -2.231436e-01 6.708203e-01 -3.326428e-01 0.73940393 8.000000e-01
4    condition -1.386294e+00 1.284523e+00 -1.079229e+00 0.28048568 2.500000e-01
5  (Intercept)  2.231436e-01 6.708203e-01  3.326428e-01 0.73940394 1.250000e+00
6    condition -9.162907e-01 1.095445e+00 -8.364553e-01 0.40289882 4.000000e-01
7  (Intercept)  2.231436e-01 6.708203e-01  3.326428e-01 0.73940394 1.250000e+00
8    condition -9.162907e-01 1.095445e+00 -8.364553e-01 0.40289882 4.000000e-01
9  (Intercept) -2.231436e-01 6.708204e-01 -3.326428e-01 0.73940397 8.000000e-01
10   condition -1.386294e+00 1.284523e+00 -1.079229e+00 0.28048583 2.500000e-01
11 (Intercept) -2.231436e-01 6.708204e-01 -3.326428e-01 0.73940395 8.000000e-01
12   condition -4.700036e-01 1.095445e+00 -4.290527e-01 0.66788485 6.250000e-01
13 (Intercept) -7.440587e-01 1.454336e+00 -5.116141e-01 0.60892109 4.751814e-01
14   condition -5.938497e+04 2.739708e+07 -2.167566e-03 0.99827053 0.000000e+00
15 (Intercept)  2.231435e-01 6.708204e-01  3.326427e-01 0.73940398 1.250000e+00
16   condition  3.442056e+01 1.351269e+07  2.547276e-06 0.99999797 8.884999e+14
17 (Intercept) -1.252763e+00 8.017837e-01 -1.562470e+00 0.11817732 2.857143e-01
18   condition  3.800559e+01 2.739708e+07  1.387213e-06 0.99999889 3.203452e+16
        prob   var
1  0.8888889  Q5_3
2  0.2000000  Q5_3
3  0.4444444    Q6
4  0.2000000    Q6
5  0.5555556  Q7_1
6  0.2857143  Q7_1
7  0.5555556  Q7_2
8  0.2857143  Q7_2
9  0.4444444  Q7_3
10 0.2000000  Q7_3
11 0.4444444  Q7_4
12 0.3846154  Q7_4
13 0.3221173 Q13_1
14 0.0000000 Q13_1
15 0.5555556 Q13_2
16 1.0000000 Q13_2
17 0.2222222 Q13_3
18 1.0000000 Q13_3

Upvotes: -2

Related Questions