Reputation: 6528
How can Levene's test be performed using the squared residuals rather than the absolute?
I have tried levene.test
in the lawstat
package, and leveneTest
in the car
package, which both use the absolute residuals.
The purpose is to reproduce SAS output, which uses squared residuals by default.
Upvotes: 1
Views: 2524
Reputation: 6528
iris.lm <- lm(Petal.Width ~ Species, data = iris)
anova(lm(residuals(iris.lm)^2 ~ iris$Species))
## Analysis of Variance Table
##
## Response: residuals(iris.lm)^2
## Df Sum Sq Mean Sq F value Pr(>F)
## iris$Species 2 0.100 0.0500 14.8 1.4e-06 ***
## Residuals 147 0.497 0.0034
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Perhaps it helps to understand how this works.
As is pointed out out here, Levene's test is just an ANOVA of the distance between each observation and the centre of its group. Different implementations of Levene's test differ by their definitions of “distance” and “centre”.
“Distance” can mean the absolute differences, or the squared differences.
“Centre” can mean the mean or the median.
SAS uses squared differences by default, and the mean. leveneTest
in the car
package of R uses the absolute differences only, and the median by default. So does levene.test
in the lawstat
package.
All four possible combinations can be done by hand as follows.
require(plyr)
x <- ddply(iris, .(Species), summarize
, abs.mean = abs(Petal.Width - mean(Petal.Width))
, abs.median = abs(Petal.Width - median(Petal.Width))
, squared.mean = (Petal.Width - mean(Petal.Width))^2
, squared.median = (Petal.Width - median(Petal.Width))^2)
anova(lm(abs.mean ~ Species, data = x)) # Levene's test
## Analysis of Variance Table
##
## Response: abs.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## Species 2 0.53 0.2648 19.6 2.7e-08 ***
## Residuals 147 1.98 0.0135
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm(abs.median ~ Species, data = x)) # Brown-Forsythe test
## Analysis of Variance Table
##
## Response: abs.median
## Df Sum Sq Mean Sq F value Pr(>F)
## Species 2 0.642 0.321 19.9 2.3e-08 ***
## Residuals 147 2.373 0.016
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm(squared.mean ~ Species, data = x)) # default SAS Levene's Test
## Analysis of Variance Table
##
## Response: squared.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## Species 2 0.100 0.0500 14.8 1.4e-06 ***
## Residuals 147 0.497 0.0034
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm(squared.median ~ Species, data = x)) # Who-Knows-Whose Test
## Analysis of Variance Table
##
## Response: squared.median
## Df Sum Sq Mean Sq F value Pr(>F)
## Species 2 0.096 0.0478 13.6 3.7e-06 ***
## Residuals 147 0.515 0.0035
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To show that the first two above reproduce leveneTest:
require(car)
leveneTest(Petal.Width ~ Species, data = iris, center = mean)
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 2 19.6 2.7e-08 ***
## 147
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
leveneTest(Petal.Width ~ Species, data = iris, center = median)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 19.9 2.3e-08 ***
## 147
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since one often has a linear model to hand with the residuals (mean) ready to go, it's often more convenient to do
iris.lm <- lm(Petal.Width ~ Species, data = iris)
anova(lm(residuals(iris.lm)^2 ~ iris$Species))
## Analysis of Variance Table
##
## Response: residuals(iris.lm)^2
## Df Sum Sq Mean Sq F value Pr(>F)
## iris$Species 2 0.100 0.0500 14.8 1.4e-06 ***
## Residuals 147 0.497 0.0034
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hence the answer at the top.
Upvotes: 3