Reputation: 31
I am trying to use the lme4 package in R and function lmer() to fit a model for my split-split plot design. I would have used a repeated measures ANOVA if I did not have a small number of observations missing, but the missing data should be no problem with a linear mixed effects model.
My data frame (data
) has a simple structure with four factors and a numeric outcome variable called all_vai
. Note that in this example data frame, not all levels of all factors are crossed even though they would be in my real data (except for the missing observations). It shouldn't matter for my question, which is an attempt to fix problematic syntax.
collected_vai <- rnorm(125, mean = 6, sd = 1)
missing <- rep(NA, times = 3)
all_vai <- c(collected_vai, missing)
year1 <- rep(2018, times = 32)
year2 <- rep(2019, times = 32)
year3 <- rep(2020, times = 32)
year4 <- rep(2021, times = 32)
year <- c(year1, year2, year3, year4)
disturbance_severity <- rep(c(0,45,65,85), each = 32)
treatment <- rep(c("B" , "T"), each = 64)
replicate <- rep(c("A", "B", "C", "D"), each = 32)
data = data.frame(all_vai, year, disturbance_severity, treatment, replicate)
data$year <- as.factor(data$year)
data$disturbance_severity <- as.factor(data$disturbance_severity)
data$treatment <- as.factor(data$treatment)
data$replicate <- as.factor(data$replicate)
Here is the model I ran for an identical data set with a different (normally distributed) numeric outcome and no missing observations -- i.e., this is the model I would be running if I didn't have unbalanced repeated measures now due to missing data:
VAImodel1 <- aov(all_vai ~ disturbance_severity*treatment*year + Error(replicate/disturbance_severity/treatment/year), data = data)
summary(VAImodel1)
When I run this, I get the error message: "Warning message: In aov(mean_vai ~ disturbance_severity * treatment * Year + Error(Replicate/disturbance_severity/treatment/Year), : Error() model is singular"
I have observations from different years nested within different treatments, which are nested within different disturbance severities, and all of this nested within replicates (which are experimental blocks). So I tried using this structure in lme4:
library(lme4)
library(lmerTest)
VAImodel2 <- lmer(all_vai ~ (year|replicate:disturbance_severity:treatment) + disturbance_severity*treatment*year, data = data)
summary(VAImodel2)
And this is the error message I get: "Error: number of observations (=125) <= number of random effects (=128) for term (Year | Replicate:disturbance_severity:treatment); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable"
Next I tried simplifying my model so that I was not running out of degrees of freedom, by removing the treatment variable and interaction term, like so:
VAImodel3 <- lmer(all_vai ~ (year|replicate:disturbance_severity) + disturbance_severity*year, data = data)
summary(VAImodel3)
This time I get a different error: "boundary (singular) fit: see ?isSingular Warning message: Model failed to converge with 1 negative eigenvalue: -1.2e-01 "
Thank you in advance for any help.
Upvotes: 3
Views: 636
Reputation: 4949
Your problem is wrong data preparation!!
Let's start by defining values for your variables year
, disturbance_severity
, treatment
, replicate
.
library(tidyverse)
set.seed(123)
yars = 2018:2021
disturbances = c(0,45,65,85)
treatments = c("B" , "T")
replicates = c("A", "B", "C", "D")
n = length(yars)*length(disturbances)*length(treatments)*length(replicates)*1
nNA=3
Please note that I first created the variables yars
, disturbances
, treatments
and replicates
with all the allowed values.
Then I calculated the amount of data in n
(you can increase the last value in the multiplication from 1
e.g. to 10
) and determined how many values will be missing in the variable nNA
.
The key aspect is the use of the function expand.grid(yars, disturbances, treatments, replicates)
which will return the appropriate table with the correct distribution of values.
Look at the first few lines of what expand.grid
returns.
Var1 Var2 Var3 Var4
1 2018 0 B A
2 2019 0 B A
3 2020 0 B A
4 2021 0 B A
5 2018 45 B A
6 2019 45 B A
7 2020 45 B A
8 2021 45 B A
9 2018 65 B A
10 2019 65 B A
11 2020 65 B A
12 2021 65 B A
13 2018 85 B A
14 2019 85 B A
15 2020 85 B A
16 2021 85 B A
17 2018 0 T A
18 2019 0 T A
This is crucial here.
The next step is straight ahead. We create a tibble
sequence and put it in the aov
function.
data = tibble(sample(c(rnorm(n-nNA, mean = 6, sd = 1), rep(NA, nNA)), n)) %>%
mutate(expand.grid(yars, disturbances, treatments, replicates)) %>%
rename_with(~c("all_vai", "year", "disturbance_severity", "treatment", "replicate"))
VAImodel1 <- aov(all_vai ~ disturbance_severity*treatment*year +
Error(replicate/disturbance_severity/treatment/year), data = data)
summary(VAImodel1)
output
Error: replicate
Df Sum Sq Mean Sq F value Pr(>F)
disturbance_severity 1 0.1341 0.1341 0.093 0.811
treatment 1 0.0384 0.0384 0.027 0.897
Residuals 1 1.4410 1.4410
Error: replicate:disturbance_severity
Df Sum Sq Mean Sq F value Pr(>F)
disturbance_severity 1 0.1391 0.1391 0.152 0.763
treatment 1 0.1819 0.1819 0.199 0.733
year 1 1.4106 1.4106 1.545 0.431
Residuals 1 0.9129 0.9129
Error: replicate:disturbance_severity:treatment
Df Sum Sq Mean Sq F value Pr(>F)
treatment 1 0.4647 0.4647 0.698 0.491
year 1 0.8127 0.8127 1.221 0.384
Residuals 2 1.3311 0.6655
Error: replicate:disturbance_severity:treatment:year
Df Sum Sq Mean Sq F value Pr(>F)
treatment 1 2.885 2.8846 3.001 0.144
year 1 0.373 0.3734 0.388 0.560
treatment:year 1 0.002 0.0015 0.002 0.970
Residuals 5 4.806 0.9612
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
treatment 1 0.03 0.031 0.039 0.8430
year 1 1.29 1.292 1.662 0.2002
treatment:year 1 4.30 4.299 5.532 0.0206 *
Residuals 102 79.26 0.777
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Now there are no model is singular errors!!
Upvotes: 4