Leprechault
Leprechault

Reputation: 1823

Plot generalized linear mixed models (GLMMs): mixture of categorical and numeric variables

I'd like to plot the relationship between the number of ladenant response variable in function of Bioma (categorical) and temp (numeric) using binomial negative generalized linear mixed models (GLMMs) without success. I try to do:

#Packages
library(lme4)
library(ggplot2)
library(ggeffects)

#Open my dataset
myds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_glmm_dataset.csv")
myds <- myds[,-c(3)] # remove bad character variable

# Negative binomial GLMM
m.laden.1  <- glmer.nb(ladenant ~ Bioma +  poly(temp,2) + scale(UR) + (1 | formigueiro), data = DataBase)

# Plot the results
mydf <- ggpredict(m.laden.1, terms = c("temp","Bioma"))
ggplot(mydf, aes(x, predicted), group = Bioma) +
  geom_point(DataBase, aes(temp, ladenant), alpha = 0.5) + # Observed ladenant response variable
  geom_line() +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = .1)

I have a not so good plot because I don't have one line for each Bioma and bad lines for temp variable:

my-plot

But the object mydf specification contains the Bioma variable:

mydf 

# # Predicted counts of ladenant

# # Bioma = Atlantic Forest

# temp | Predicted |         95% CI
# ---------------------------------
#   10 |      1.88 | [ 0.81,  4.35]
#   15 |     12.95 | [ 9.11, 18.40]
#   20 |     32.61 | [26.42, 40.25]
#   25 |     30.00 | [23.51, 38.28]
#   30 |     10.08 | [ 4.79, 21.24]
#   35 |      1.24 | [ 0.24,  6.43]

# # Bioma = Transition

# temp | Predicted |          95% CI
# ----------------------------------
#   10 |      6.84 | [ 3.04,  15.42]
#   15 |     47.17 | [34.05,  65.34]
#   20 |    118.79 | [92.27, 152.94]
#   25 |    109.29 | [76.84, 155.43]
#   30 |     36.73 | [16.17,  83.44]
#   35 |      4.51 | [ 0.82,  24.71]

# # Bioma = Pampa

# temp | Predicted |         95% CI
# ---------------------------------
#   10 |      1.42 | [ 0.70,  2.90]
#   15 |      9.80 | [ 7.47, 12.86]
#   20 |     24.69 | [18.74, 32.52]
#   25 |     22.71 | [16.46, 31.35]
#   30 |      7.63 | [ 3.65, 15.96]
#   35 |      0.94 | [ 0.19,  4.67]

# Adjusted for:
# *          UR = 82.78
# * formigueiro = 0 (population-level)

Plaese, any help to improve this plot?

Upvotes: 0

Views: 3531

Answers (2)

Daniel
Daniel

Reputation: 7832

You could also use plot(), which returns a ggplot object, and add additional layers as needed.

library(lme4)
#> Loading required package: Matrix
library(ggplot2)
library(ggeffects)

#Open my dataset
myds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_glmm_dataset.csv")
myds <- myds[,-c(3)] # remove bad character variable

# Negative binomial GLMM
m.laden.1  <- glmer.nb(ladenant ~ Bioma +  poly(temp,2) + scale(UR) + (1 | formigueiro), data = myds)

mydf <- ggpredict(m.laden.1, terms = c("temp [all]","Bioma"))

plot(mydf, add.data = TRUE, ci = FALSE)

plot(mydf, add.data = TRUE, ci = FALSE) + ggplot2::scale_y_log10()
#> Scale for 'y' is already present. Adding another scale for 'y', which will
#> replace the existing scale.

Created on 2021-11-17 by the reprex package (v2.0.1)

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 174546

I think you just need to be careful with the different names of the variables in the two objects myds and mydf, and where you place them in the calls to the various geoms:

library(lme4)
#> Loading required package: Matrix
library(ggplot2)
library(ggeffects)

#Open my dataset
myds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_glmm_dataset.csv")
myds <- myds[,-c(3)] # remove bad character variable

# Negative binomial GLMM
m.laden.1 <- glmer.nb(ladenant ~ Bioma +  poly(temp,2) + scale(UR) + (1 | formigueiro),
                      data = myds)

# Plot the results
mydf <- ggpredict(m.laden.1, terms = c("temp [all]", "Bioma"))

ggplot(mydf, aes(x, predicted)) +
  geom_point(data=myds, aes(temp, ladenant, color = Bioma), alpha = 0.5) + 
  geom_line(aes(color = group)) +
  labs(x = "temp", y = "ladenant")

Note that I haven't included your geom_ribbon, because the conf.low and conf.high are all NA in the upper half of the curve, which makes it look pretty messy.

Incidentally, the plot might be more informative with a log y scale:

ggplot(mydf, aes(x, predicted)) +
  geom_point(data=myds, aes(temp, ladenant, color = Bioma), alpha = 0.5) + 
  geom_line(aes(color = group)) +
  scale_y_log10() +
  labs(x = "temp", y = "ladenant")

enter image description here

Created on 2021-11-12 by the reprex package (v2.0.0)

Upvotes: 3

Related Questions