Lisa H
Lisa H

Reputation: 31

What is wrong with my syntax in lme4::lmer() for a split-split plot design with unbalanced repeated measures?

I am trying to use the lme4 package in R and function lmer() to fit a model for my split-split plot design. I would have used a repeated measures ANOVA if I did not have a small number of observations missing, but the missing data should be no problem with a linear mixed effects model.

My data frame (data) has a simple structure with four factors and a numeric outcome variable called all_vai. Note that in this example data frame, not all levels of all factors are crossed even though they would be in my real data (except for the missing observations). It shouldn't matter for my question, which is an attempt to fix problematic syntax.

collected_vai <- rnorm(125, mean = 6, sd = 1)
missing <- rep(NA, times = 3)
all_vai <- c(collected_vai, missing)
year1 <- rep(2018, times = 32)
year2 <- rep(2019, times = 32)
year3 <- rep(2020, times = 32)
year4 <- rep(2021, times = 32)
year <- c(year1, year2, year3, year4)
disturbance_severity <- rep(c(0,45,65,85), each = 32)
treatment <- rep(c("B" , "T"), each = 64)
replicate <- rep(c("A", "B", "C", "D"), each = 32)
data = data.frame(all_vai, year, disturbance_severity, treatment, replicate)
data$year <- as.factor(data$year)
data$disturbance_severity <- as.factor(data$disturbance_severity)
data$treatment <- as.factor(data$treatment)
data$replicate <- as.factor(data$replicate)

Here is the model I ran for an identical data set with a different (normally distributed) numeric outcome and no missing observations -- i.e., this is the model I would be running if I didn't have unbalanced repeated measures now due to missing data:

VAImodel1 <- aov(all_vai ~ disturbance_severity*treatment*year + Error(replicate/disturbance_severity/treatment/year), data = data)
summary(VAImodel1)

When I run this, I get the error message: "Warning message: In aov(mean_vai ~ disturbance_severity * treatment * Year + Error(Replicate/disturbance_severity/treatment/Year), : Error() model is singular"

I have observations from different years nested within different treatments, which are nested within different disturbance severities, and all of this nested within replicates (which are experimental blocks). So I tried using this structure in lme4:

library(lme4)
library(lmerTest)

VAImodel2 <- lmer(all_vai ~ (year|replicate:disturbance_severity:treatment) + disturbance_severity*treatment*year, data = data)
summary(VAImodel2)

And this is the error message I get: "Error: number of observations (=125) <= number of random effects (=128) for term (Year | Replicate:disturbance_severity:treatment); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable"

Next I tried simplifying my model so that I was not running out of degrees of freedom, by removing the treatment variable and interaction term, like so:

VAImodel3 <- lmer(all_vai ~ (year|replicate:disturbance_severity) + disturbance_severity*year, data = data)
summary(VAImodel3)

This time I get a different error: "boundary (singular) fit: see ?isSingular Warning message: Model failed to converge with 1 negative eigenvalue: -1.2e-01 "

Thank you in advance for any help.

Upvotes: 3

Views: 636

Answers (1)

Marek Fiołka
Marek Fiołka

Reputation: 4949

Your problem is wrong data preparation!!

Let's start by defining values for your variables year, disturbance_severity, treatment, replicate.

library(tidyverse)

set.seed(123)
yars = 2018:2021
disturbances = c(0,45,65,85)
treatments = c("B" , "T")
replicates = c("A", "B", "C", "D")
n = length(yars)*length(disturbances)*length(treatments)*length(replicates)*1
nNA=3

Please note that I first created the variables yars, disturbances, treatments and replicates with all the allowed values.

Then I calculated the amount of data in n (you can increase the last value in the multiplication from 1 e.g. to 10) and determined how many values will be missing in the variable nNA.

The key aspect is the use of the function expand.grid(yars, disturbances, treatments, replicates) which will return the appropriate table with the correct distribution of values.

Look at the first few lines of what expand.grid returns.

    Var1 Var2 Var3 Var4
1   2018    0    B    A
2   2019    0    B    A
3   2020    0    B    A
4   2021    0    B    A
5   2018   45    B    A
6   2019   45    B    A
7   2020   45    B    A
8   2021   45    B    A
9   2018   65    B    A
10  2019   65    B    A
11  2020   65    B    A
12  2021   65    B    A
13  2018   85    B    A
14  2019   85    B    A
15  2020   85    B    A
16  2021   85    B    A
17  2018    0    T    A
18  2019    0    T    A

This is crucial here. The next step is straight ahead. We create a tibble sequence and put it in the aov function.

data = tibble(sample(c(rnorm(n-nNA, mean = 6, sd = 1), rep(NA, nNA)), n)) %>% 
  mutate(expand.grid(yars, disturbances, treatments, replicates)) %>% 
  rename_with(~c("all_vai", "year", "disturbance_severity", "treatment", "replicate"))

VAImodel1 <- aov(all_vai ~ disturbance_severity*treatment*year + 
                   Error(replicate/disturbance_severity/treatment/year), data = data)
summary(VAImodel1)

output

Error: replicate
                     Df Sum Sq Mean Sq F value Pr(>F)
disturbance_severity  1 0.1341  0.1341   0.093  0.811
treatment             1 0.0384  0.0384   0.027  0.897
Residuals             1 1.4410  1.4410               

Error: replicate:disturbance_severity
                     Df Sum Sq Mean Sq F value Pr(>F)
disturbance_severity  1 0.1391  0.1391   0.152  0.763
treatment             1 0.1819  0.1819   0.199  0.733
year                  1 1.4106  1.4106   1.545  0.431
Residuals             1 0.9129  0.9129               

Error: replicate:disturbance_severity:treatment
          Df Sum Sq Mean Sq F value Pr(>F)
treatment  1 0.4647  0.4647   0.698  0.491
year       1 0.8127  0.8127   1.221  0.384
Residuals  2 1.3311  0.6655               

Error: replicate:disturbance_severity:treatment:year
               Df Sum Sq Mean Sq F value Pr(>F)
treatment       1  2.885  2.8846   3.001  0.144
year            1  0.373  0.3734   0.388  0.560
treatment:year  1  0.002  0.0015   0.002  0.970
Residuals       5  4.806  0.9612               

Error: Within
                Df Sum Sq Mean Sq F value Pr(>F)  
treatment        1   0.03   0.031   0.039 0.8430  
year             1   1.29   1.292   1.662 0.2002  
treatment:year   1   4.30   4.299   5.532 0.0206 *
Residuals      102  79.26   0.777                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Now there are no model is singular errors!!

Upvotes: 4

Related Questions