Reputation: 868
I'm running a nested ANOVA with the following setup: 2 areas, one is reference, one is exposure (column named CI = Control/Impact). Two time periods (before and after impact, column named BA), with 1 year in the before period and 3 years in the after period. The years are nested.
My question is: if I use the original years (in column Time2 in the toy dataset), I get one result. If I rename the years, so that they're just 1 for Before and 1-3 for After, I get a different result.
Questions:
toy dataset:
toy <- structure(list(BA = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("A", "B"), class = "factor"), Time = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", "2", "3"), class = "factor"),
Time2 = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L
), .Label = c("11", "12", "13", "15", "16", "17"), class = "factor"),
Lake = c("Area 1", "Area 1", "Area 1", "Area 1", "Area 1",
"Area 2", "Area 2", "Area 2", "Area 2", "Area 2", "Area 1",
"Area 1", "Area 1", "Area 1", "Area 1", "Area 2", "Area 2",
"Area 2", "Area 2", "Area 2", "Area 1", "Area 1", "Area 1",
"Area 1", "Area 1", "Area 2", "Area 2", "Area 2", "Area 2",
"Area 2", "Area 1", "Area 1", "Area 1", "Area 1", "Area 1",
"Area 2", "Area 2", "Area 2", "Area 2", "Area 2"), CI = structure(c(2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("C", "I"), class = "factor"),
Response = c(78.3, 75.3, 69.4, 75.1, 71.1, 49.7, 61, 59.6,
35.3, 26.5, 80.9, 81.4, 67.6, 73.6, 73, 46.4, 73.6, 67.1,
34, 45.5, 86.6, 78, 68.2, 76.8, 69.6, 52.1, 61.9, 50.8, 39.2,
49.6, 72, 74, 71, 68, 58, 40, 41, 34, 54, 61)), .Names = c("BA",
"Time", "Time2", "Lake", "CI", "Response"), row.names = c(NA,
40L), class = "data.frame")
analysis using type 1 SS:
mod <- lm(Response ~ BA + CI + BA*CI + BA/Time + BA/Time*CI, data = toy)
mod1 <- lm(Response ~ BA + CI + BA*CI + BA/Time2 + BA/Time2*CI, data = toy)
# results are the same
anova(mod)
anova(mod1)
now try with type 2
library(car)
options(contrasts=c("contr.sum", "contr.poly"))
mod <- lm(Response ~ BA + CI + BA*CI + BA/Time + BA/Time*CI, data = toy)
mod1 <- lm(Response ~ BA + CI + BA*CI + BA/Time2 + BA/Time2*CI, data = toy)
Anova(mod, type = "II", singular.ok = TRUE)
Anova(mod1, type = "II", singular.ok = TRUE)
and type 3
Anova(mod, type = "III", singular.ok = TRUE)
Anova(mod1, type = "III", singular.ok = TRUE)
Upvotes: 7
Views: 1017
Reputation: 4711
Warning at the outset, bits of this probably aren't exactly right - I'm admittedly rusty on ANOVA at this point.
In a Type-III SS analysis we say that the main effects are qualified by the interaction. In short, that means that in the presence of higher order interactions, lower order interactions and main effects are less interpretable. Tradition pushes us towards Type-III analyses, but they really are a pain for this reason. Anyway...
Let's take a quick peek at your recode. The effect of your recode was that Time2
went from having four distinct values to having three distinct values in Time
. You may hold onto some interpretability because the combination of BA
at level B is unique to the time values that were previously 13, but are now 1.
Let's go back to your data. Now, BA:Time2 carries all of the same information as BA:CI. How does this look in your Time2 ANOVA results...
> Anova(mod1, type = "III", singular.ok = TRUE)
Anova Table (Type III tests)
Response: Response
Sum Sq Df F value Pr(>F)
(Intercept) 73945 1 712.5963 < 2.2e-16 ***
BA 246 1 2.3678 0.1337
CI 2484 1 23.9401 2.713e-05 ***
BA:CI 0 1 0.0046 0.9462
BA:Time2 95 2 0.4570 0.6372
BA:CI:Time2 37 2 0.1797 0.8364
Residuals 3321 32
... SS for BA:CI is, as expected (more or less) 0. Contrast this with the Time model...
> Anova(mod, type = "III", singular.ok = TRUE)
Anova Table (Type III tests)
Response: Response
Sum Sq Df F value Pr(>F)
(Intercept) 107772 1 1038.5835 < 2.2e-16 ***
BA 209 1 2.0099 0.1659
CI 4220 1 40.6655 3.661e-07 ***
BA:CI 9 1 0.0907 0.7653
BA:Time 95 2 0.4570 0.6372
BA:CI:Time 37 2 0.1797 0.8364
Residuals 3321 32
BA:CI got to hold onto a bit of variance... and the rest of the model seemed to have done better as well.
My sense is that you have a poorly specified data set for ANOVA under all coding schemes. Level B of AB is confounded with one level of your time group under both ways of coding things. Also take special note of the comments in package:car for Anova, particularly with regard to the singular.ok arguement. In short it says 'defaults to TRUE for type-II tests, and FALSE for type-III tests (where the tests for models with aliased coefficients will not be straightforwardly interpretable)' ... and here you are with seemingly aliased coefficients.
... nevermind that Fox's description of what he has done with Type II and Type III tests is... challenging to interpret. This is a reminder for me as to why I always used package:ez for ezANOVA() back in the day when I had to tolerate non-type I testing.
Upvotes: 1