Reputation: 13
I am trying to analyze the data set below
X wt solution process
1 21 1 1
2 36 2 1
3 25 3 1
4 18 4 1
5 22 5 1
6 26 1 2
7 38 2 2
8 27 3 2
9 17 4 2
10 26 5 2
11 16 1 3
12 25 2 3
13 22 3 3
14 18 4 3
15 21 5 3
16 28 1 4
17 35 2 4
18 27 3 4
19 20 4 4
20 24 5 4
The data is balanced, and I believe both effects are fixed. My code in R is below
str(wool)
##convert solution and process to factors
wool$solution<-as.factor(wool$solution)
wool$process<-as.factor(wool$process)
m1<-aov(wt~solution*process,data=wool)
plot(m1)
However, when I try to plot the model to check assumptions I get the following Error:
not plotting observations with leverage one:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20Error in qqnorm.default(rs, main = main, ylab = ylab23, ylim = ylim, ...) :
y is empty or has only NAs
I am not sure how to correct this, when I plot the model without interactions
m1<-aov(wt~solution+process,data=wool)
Everything works just fine, however I need to analyze the interaction to see if it is significant. Also when I just keep the factors as numeric then it works, but these are definitely categorical factors, each refers to a type of process and a type of solution used to treat the observations.
Any help is appreciated.
Upvotes: 1
Views: 1122
Reputation: 10855
The model with the interaction effect consumes 19 degrees of freedom. The degrees of freedom for the error term in an analysis of variance is N - k where N = total number of observations and k = the number of group effects.
Since N is 20 and the number of groups plus interactions, k, is also 20, the degrees of freedom across the model terms is 19. Therefore, the degrees of freedom for the error term is 0, and the leverage of each observation will be 1, meaning that each of the 19 parameters in the model plus the grand mean are completely dependent on the 20 observations in the data frame. This explains the error message returned the plot()
function.
rawData <- "X wt solution process
1 21 1 1
2 36 2 1
3 25 3 1
4 18 4 1
5 22 5 1
6 26 1 2
7 38 2 2
8 27 3 2
9 17 4 2
10 26 5 2
11 16 1 3
12 25 2 3
13 22 3 3
14 18 4 3
15 21 5 3
16 28 1 4
17 35 2 4
18 27 3 4
19 20 4 4
20 24 5 4"
wool <- read.table(text=rawData,header=TRUE)
str(wool)
##convert solution and process to factors
wool$solution<-as.factor(wool$solution)
wool$process<-as.factor(wool$process)
m1<-aov(wt~solution*process,data=wool)
summary(m1)
...and the output:
> m1<-aov(wt~solution*process,data=wool)
> summary(m1)
Df Sum Sq Mean Sq
solution 4 500.8 125.20
process 3 136.8 45.60
solution:process 12 87.2 7.27
The problem becomes clear when we print the ANOVA table with anova(m1)
.
> anova(m1)
Analysis of Variance Table
Response: wt
Df Sum Sq Mean Sq F value Pr(>F)
solution 4 500.8 125.200
process 3 136.8 45.600
solution:process 12 87.2 7.267
Residuals 0 0.0
Warning message:
In anova.lm(m1) :
ANOVA F-tests on an essentially perfect fit are unreliable
The insufficient degrees of freedom problem is even more obvious when we use lm()
to fit the model.
m2 <- lm(wt ~ solution*process,data=wool)
summary(m2)
...and the output:
> m2 <- lm(wt ~ solution*process,data=wool)
> summary(m2)
Call:
lm(formula = wt ~ solution * process, data = wool)
Residuals:
ALL 20 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21 NA NA NA
solution2 15 NA NA NA
solution3 4 NA NA NA
solution4 -3 NA NA NA
solution5 1 NA NA NA
process2 5 NA NA NA
process3 -5 NA NA NA
process4 7 NA NA NA
solution2:process2 -3 NA NA NA
solution3:process2 -3 NA NA NA
solution4:process2 -6 NA NA NA
solution5:process2 -1 NA NA NA
solution2:process3 -6 NA NA NA
solution3:process3 2 NA NA NA
solution4:process3 5 NA NA NA
solution5:process3 4 NA NA NA
solution2:process4 -8 NA NA NA
solution3:process4 -5 NA NA NA
solution4:process4 -5 NA NA NA
solution5:process4 -5 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 19 and 0 DF, p-value: NA
Regarding the OP question about the code working when the analysis is run with lm()
using continuous variables, with continuous variables an interaction effect consumes a single degree of freedom, not (solutions - 1) * (processes - 1) or 12 degrees of freedom for the interaction between the two categorical variables.
Again, we can demonstrate this with lm()
.
wool$solution<-as.numeric(wool$solution)
wool$process<-as.numeric(wool$process)
m3 <- lm(wt ~ solution*process,data=wool)
summary(m3)
anova(m3)
...and the output:
> summary(m3)
Call:
lm(formula = wt ~ solution * process, data = wool)
Residuals:
Min 1Q Median 3Q Max
-11.460 -3.757 -0.180 2.320 12.000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.9000 8.1454 3.548 0.00268 **
solution -1.5000 2.4559 -0.611 0.54993
process -0.0100 2.9743 -0.003 0.99736
solution:process 0.0300 0.8968 0.033 0.97373
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.341 on 16 degrees of freedom
Multiple R-squared: 0.1123, Adjusted R-squared: -0.05409
F-statistic: 0.675 on 3 and 16 DF, p-value: 0.5798
> anova(m3)
Analysis of Variance Table
Response: wt
Df Sum Sq Mean Sq F value Pr(>F)
solution 1 81.22 81.225 2.0200 0.1744
process 1 0.16 0.160 0.0040 0.9505
solution:process 1 0.05 0.045 0.0011 0.9737
Residuals 16 643.37 40.211
>
Upvotes: 1