Reputation: 1
I fitted a linear mixed model to agricultural data, accounting for unequal variances across groups (cultivars) by passing weights = varIdent(...)
to lme
. lsmeans
is showing standard errors that do not align with significant differences.
An example of my code is below. Data to reproduce the output can be found here.
library(nlme)
library(multcomp)
library(emmeans)
model <- lme(variable ~ cultivar*year,
random = ~1|block,
weights = varIdent(form = ~1|cultivar),
method = "REML",
na.action = na.omit,
data = ag.data)
Leastsquare <- lsmeans(model,"cultivar")
cld(Leastsquare, Letters = letters)
cultivar lsmean SE df lower.CL upper.CL .group
Golden 6.92 3.841 1 -41.9 55.7 a
Campfield 10.33 4.330 1 -44.7 65.4 a
Tom 17.50 0.167 1 15.4 19.6 a
Harrison 25.67 12.649 1 -135.1 186.4 ab
Puget 30.58 20.502 1 -229.9 291.1 ab
HVC 37.08 5.331 1 -30.7 104.8 b
COL 38.08 0.433 1 32.6 43.6 b
Brown 62.67 20.207 1 -194.1 319.4 ab
How can it be that the the cultivar Brown
is not significantly different from Golden
? Is this acceptable? Has anyone seen results similar to this?
Upvotes: 0
Views: 1275
Reputation: 1219
I made your example into a reprex. Please find my comments below.
library(tidyverse)
ag.data <- tibble::tribble(
~year, ~cultivar, ~block, ~variable,
"nineteen", "HVC", 1L, 14.33333333,
"nineteen", "HVC", 2L, 23.33333333,
"nineteen", "Puget", 1L, 2.333333333,
"nineteen", "Puget", 2L, 3.333333333,
"nineteen", "Campfield", 1L, NA,
"nineteen", "Campfield", 2L, 4,
"nineteen", "Tom", 1L, 10,
"nineteen", "Tom", 2L, 10,
"nineteen", "Brown", 1L, NA,
"nineteen", "Brown", 2L, 56.66666667,
"nineteen", "COL", 1L, NA,
"nineteen", "COL", 2L, 51.66666667,
"nineteen", "Golden", 1L, 5,
"nineteen", "Golden", 2L, 1.666666667,
"nineteen", "Harrison", 1L, 52.33333333,
"nineteen", "Harrison", 2L, 4.333333333,
"twenty", "HVC", 1L, 45.66666667,
"twenty", "HVC", 2L, 65,
"twenty", "Puget", 1L, 17.33333333,
"twenty", "Puget", 2L, 99.33333333,
"twenty", "Campfield", 1L, 11.66666667,
"twenty", "Campfield", 2L, 21.66666667,
"twenty", "Tom", 1L, 25.33333333,
"twenty", "Tom", 2L, 24.66666667,
"twenty", "Brown", 1L, 45.33333333,
"twenty", "Brown", 2L, 92,
"twenty", "COL", 1L, 24,
"twenty", "COL", 2L, 25,
"twenty", "Golden", 1L, 3,
"twenty", "Golden", 2L, 18,
"twenty", "Harrison", 1L, 31,
"twenty", "Harrison", 2L, 15
)
library(nlme)
library(emmeans)
library(multcomp)
library(multcompView)
model <- lme(
variable ~ cultivar * year,
random = ~ 1 | block,
weights = varIdent(form = ~ 1 | cultivar),
method = "REML",
na.action = na.omit,
data = ag.data
)
anova(model)
#> numDF denDF F-value p-value
#> (Intercept) 1 12 16502.874 <.0001
#> cultivar 7 12 193.823 <.0001
#> year 1 12 952.713 <.0001
#> cultivar:year 7 12 296.145 <.0001
ag.data %>%
filter(!is.na(variable)) %>%
ggplot(aes(y = variable, x = year)) +
facet_wrap(vars(cultivar)) +
geom_point() +
stat_summary(fun = mean,
color = "red",
geom = "line",
aes(group = 1)) +
theme_bw()
emm <- emmeans(model, ~ cultivar) %>%
cld(Letters = letters) %>%
as_tibble() %>%
mutate(cultivar = fct_reorder(cultivar, emmean))
#> NOTE: Results may be misleading due to involvement in interactions
ggplot(emm, aes(
y = emmean,
ymin = lower.CL,
ymax = upper.CL,
x = cultivar,
label = str_trim(.group)
)) +
geom_point() +
geom_errorbar(width = 0.1) +
geom_text(
position = position_nudge(x = 0.1),
hjust = 0,
color = "red"
) +
theme_bw()
Created on 2022-01-24 by the reprex package (v2.0.1)
Addressing your question why the cultivars with the lowest and highest emmeans are not significantly different: I think looking at the second plot makes it clear that this is due to the vastly different precisions with which the emmeans were estimated. This in turn is in part due to the heterogeneous error variances you allowed in the model and also due to the missing/unbalanced data.
I would argue that you are "unaccustomed to seeing the extreme values not be statistically different from one another when the intermediate values are" because usually with balanced data and/or no heterogeneous error variance you would not. Try running the code without the weights =
argument (i.e. with standard homogeneous error variance) - you will not find the "issue" you are having with these results.
Further note that you actually seem to be having cultivar-year-interactions - see the anova()
and the first plot. Thus, looking at the cultivar means may be misleading as the note below the emmeans()
function says. Instead, you could try investigating emmean-comparisons per year via emmeans(~ cultivar|year)
.
Upvotes: 0