Aidan Kendall
Aidan Kendall

Reputation: 1

Unequal variance in R linear mixed model post-hoc lsmeans issue

I fitted a linear mixed model to agricultural data, accounting for unequal variances across groups (cultivars) by passing weights = varIdent(...) to lme. lsmeans is showing standard errors that do not align with significant differences.

An example of my code is below. Data to reproduce the output can be found here.

library(nlme)
library(multcomp)
library(emmeans)

model <- lme(variable ~ cultivar*year, 
             random = ~1|block, 
             weights = varIdent(form = ~1|cultivar),  
             method = "REML", 
             na.action = na.omit, 
             data = ag.data)

Leastsquare <- lsmeans(model,"cultivar")
cld(Leastsquare, Letters = letters)
 cultivar  lsmean     SE df lower.CL upper.CL .group
 Golden      6.92  3.841  1    -41.9     55.7  a    
 Campfield  10.33  4.330  1    -44.7     65.4  a    
 Tom        17.50  0.167  1     15.4     19.6  a    
 Harrison   25.67 12.649  1   -135.1    186.4  ab   
 Puget      30.58 20.502  1   -229.9    291.1  ab   
 HVC        37.08  5.331  1    -30.7    104.8   b   
 COL        38.08  0.433  1     32.6     43.6   b   
 Brown      62.67 20.207  1   -194.1    319.4  ab   

How can it be that the the cultivar Brown is not significantly different from Golden? Is this acceptable? Has anyone seen results similar to this?

Upvotes: 0

Views: 1275

Answers (1)

Paul Schmidt
Paul Schmidt

Reputation: 1219

I made your example into a reprex. Please find my comments below.

library(tidyverse)

ag.data <- tibble::tribble(
       ~year,   ~cultivar, ~block,   ~variable,
  "nineteen",       "HVC",     1L, 14.33333333,
  "nineteen",       "HVC",     2L, 23.33333333,
  "nineteen",     "Puget",     1L, 2.333333333,
  "nineteen",     "Puget",     2L, 3.333333333,
  "nineteen", "Campfield",     1L,          NA,
  "nineteen", "Campfield",     2L,           4,
  "nineteen",       "Tom",     1L,          10,
  "nineteen",       "Tom",     2L,          10,
  "nineteen",     "Brown",     1L,          NA,
  "nineteen",     "Brown",     2L, 56.66666667,
  "nineteen",       "COL",     1L,          NA,
  "nineteen",       "COL",     2L, 51.66666667,
  "nineteen",    "Golden",     1L,           5,
  "nineteen",    "Golden",     2L, 1.666666667,
  "nineteen",  "Harrison",     1L, 52.33333333,
  "nineteen",  "Harrison",     2L, 4.333333333,
    "twenty",       "HVC",     1L, 45.66666667,
    "twenty",       "HVC",     2L,          65,
    "twenty",     "Puget",     1L, 17.33333333,
    "twenty",     "Puget",     2L, 99.33333333,
    "twenty", "Campfield",     1L, 11.66666667,
    "twenty", "Campfield",     2L, 21.66666667,
    "twenty",       "Tom",     1L, 25.33333333,
    "twenty",       "Tom",     2L, 24.66666667,
    "twenty",     "Brown",     1L, 45.33333333,
    "twenty",     "Brown",     2L,          92,
    "twenty",       "COL",     1L,          24,
    "twenty",       "COL",     2L,          25,
    "twenty",    "Golden",     1L,           3,
    "twenty",    "Golden",     2L,          18,
    "twenty",  "Harrison",     1L,          31,
    "twenty",  "Harrison",     2L,          15
  )

library(nlme)
library(emmeans)
library(multcomp)
library(multcompView)

model <- lme(
  variable ~ cultivar * year,
  random = ~ 1 | block,
  weights = varIdent(form = ~ 1 | cultivar),
  method = "REML",
  na.action = na.omit,
  data = ag.data
)

anova(model)
#>               numDF denDF   F-value p-value
#> (Intercept)       1    12 16502.874  <.0001
#> cultivar          7    12   193.823  <.0001
#> year              1    12   952.713  <.0001
#> cultivar:year     7    12   296.145  <.0001

ag.data %>%
  filter(!is.na(variable)) %>%
  ggplot(aes(y = variable, x = year)) +
  facet_wrap(vars(cultivar)) +
  geom_point() +
  stat_summary(fun = mean,
               color = "red",
               geom = "line",
               aes(group = 1)) +
  theme_bw()

emm <- emmeans(model, ~ cultivar) %>% 
  cld(Letters = letters) %>% 
  as_tibble() %>% 
  mutate(cultivar = fct_reorder(cultivar, emmean))
#> NOTE: Results may be misleading due to involvement in interactions

ggplot(emm, aes(
  y = emmean,
  ymin = lower.CL,
  ymax = upper.CL,
  x = cultivar,
  label = str_trim(.group)
)) +
  geom_point() +
  geom_errorbar(width = 0.1) +
  geom_text(
    position = position_nudge(x = 0.1),
    hjust = 0,
    color = "red"
  ) +
  theme_bw()

Created on 2022-01-24 by the reprex package (v2.0.1)

Addressing your question why the cultivars with the lowest and highest emmeans are not significantly different: I think looking at the second plot makes it clear that this is due to the vastly different precisions with which the emmeans were estimated. This in turn is in part due to the heterogeneous error variances you allowed in the model and also due to the missing/unbalanced data. I would argue that you are "unaccustomed to seeing the extreme values not be statistically different from one another when the intermediate values are" because usually with balanced data and/or no heterogeneous error variance you would not. Try running the code without the weights = argument (i.e. with standard homogeneous error variance) - you will not find the "issue" you are having with these results.

Further note that you actually seem to be having cultivar-year-interactions - see the anova() and the first plot. Thus, looking at the cultivar means may be misleading as the note below the emmeans() function says. Instead, you could try investigating emmean-comparisons per year via emmeans(~ cultivar|year).

Upvotes: 0

Related Questions