Reputation: 13
I am trying to replicate a 3 Factor nested ANOVA anlaysis in a paper: Underwood, AJ (1993) The Mechanics of spatially replicated sampling programmes to detect environmental impacts in a variable world.
The data for the example (from Table 3, Underwood 1993) can be produced by:
dat <-
structure(list(B = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("A", "B"), class = "factor"), C = structure(c(2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("C", "I"), class = "factor"),
Times = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"),
Locations = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L), X = c(59L, 51L, 45L, 46L, 40L, 32L, 39L, 32L, 25L, 51L,
44L, 37L, 55L, 47L, 41L, 31L, 38L, 45L, 41L, 47L, 55L, 43L,
36L, 29L, 23L, 30L, 37L, 57L, 50L, 43L, 36L, 44L, 51L, 39L,
29L, 23L, 38L, 44L, 52L, 31L, 38L, 45L, 42L, 35L, 28L, 52L,
44L, 37L, 51L, 43L, 37L, 38L, 31L, 24L, 60L, 52L, 46L, 30L,
37L, 44L, 41L, 34L, 27L, 53L, 46L, 39L, 40L, 34L, 26L, 21L,
27L, 35L), Times.unique = structure(c(5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A_1", "A_2", "A_3",
"A_4", "B_1", "B_2", "B_3", "B_4"), class = "factor")), .Names = c("B",
"C", "Times", "Locations", "Y", "Times.unique"), row.names = c(NA,
-72L), class = "data.frame")
dat
The data frame dat has 4 factors:
B - has two levels "A" and "B" (before v after)
Times - 8 levels, 4 within before "B" and 4 within after "A", coded as 1:4 within each. note that variable Times.unique is the same thing but with a unique code for each time (before and after)
Locations - has three levels, all measured every time both before and after
C - has two levels control (C) and (I). note: two locations are control and one is impact
While I am clear on how to analyse such a design using mixed models (lmer), I would like to replicate his example exactly so that I can run some simulations to compare his method.
In particular I am attempting to replicate the SS values presented in table 4 under column "a". He fits a design that has SS and df values for the following terms:
B -> SS = 66.13, df = 1
Times(B) -> SS = 280.64, df = 6
Locations -> SS = 283.86, df = 2
B x Locations -> SS = 29.26, df = 2
Times(B) x Locations-> SS = 575.45, df = 12
Residual -> SS = 2420.00, df = 48
Total -> SS = 6208.34, df = 71
I assume the Times(B) term represents Times nested within the Before/After treatment "B". For this example he ignores that Locations are from control and impact treatments and leaves out factor C altogether.
I have tried all possible combinations I can think of to reproduce this nested anova, using both unique Times coding and Times coded as 1:4 within B (before and after). I have tried using %in%, / and Error() arguments, as well as Anova from car to change the type of SS calculated. Examples of the %in% and / nested fits include:
aov(Y~B+Locations+Times%in%B+B:Locations+Times%in%B:Locations, data=dat)
aov(Y~B+Locations+B/Times+B:Locations+B/Times:Locations, data=dat)
I seem to be unable to replicate Underwood's SS values exactly, particularly for the two interaction terms. A friend let me fit the model in statistix, where the SS values can be reproduced exactly, so it is possible to obtain the above SS values for this model.
Can anyone help me fit this model in R? I wish to embed it in a larger simulation and really need to be able to run the model in R, such that the Underwood 1993 SS values are reproduced exactly?
Upvotes: 1
Views: 8392
Reputation: 78590
Your problem is that dat$Locations
is an integer, when it should be a factor (three unique locations). One hint is that your ANOVA line thinks Locations takes up only 1 df, while Underwood gives it 2.
Simply add the line:
dat$Locations = factor(dat$Locations)
And then your line of code reproduces the Underwood results perfectly:
aov(Y~B+Locations+B/Times+B:Locations+B/Times:Locations, data=dat)
#Call:
# aov(formula = Y ~ B + Locations + B/Times + B:Locations + B/Times:Locations,
# data = dat)
#
#Terms:
# B Locations B:Times B:Locations B:Locations:Times
#Sum of Squares 66.1250 2836.8611 280.6389 29.2500 575.4444
#Deg. of Freedom 1 2 6 2 12
# Residuals
#Sum of Squares 2420.0000
#Deg. of Freedom 48
Upvotes: 1