Reputation:
I'm looking to create a user-defined contrast on my data. In brief, the data is organized in a dataframe, with each row having 1 of 4 possible conditions, a proportion of correct answers on a test, and 2 variables called "Schedule" and "Cluster." The head of my data looks like this:
Subjects Condition PC Schedule Cluster
1 1 1 0.5555556 Interleaved Similar
2 2 1 0.3425926 Interleaved Similar
3 3 1 0.7129630 Interleaved Similar
4 4 1 0.5000000 Interleaved Similar
5 5 1 0.6296296 Interleaved Similar
6 6 1 0.6851852 Interleaved Similar
There are two main contrasts I want to run. The first compares condition 1 to the mean of conditions 2, 3, and 4. The second compares condition 4 to the mean of conditions 2 and 3. I coded my two contrtasts like this:
contrast1 = c(1, -1/3, -1/3, -1/3)
contrast2 = c(0, -1/2, -1/2, 1)
I then put them into a matrix:
cond.contrasts = matrix(c(contrast1, contrast2), ncol = 2)
Per advice I saw elsewhere, I got the general inverse of this matrix with a function from the MASS
package, ginv()
:
cond.contrasts = t(ginv(cond.contrasts))
show(cond.contrasts)
[,1] [,2]
[1,] 0.75 0.0000000
[2,] -0.25 -0.3333333
[3,] -0.25 -0.3333333
[4,] -0.25 0.6666667
Note there are only two contrasts here. However, my output looks like this:
lm.experiment = lm(PC ~ Condition, PC)
summary(lm.experiment)
Call:
lm(formula = PC ~ Condition, data = PC)
Residuals:
Min 1Q Median 3Q Max
-0.22099 -0.12069 -0.00926 0.11443 0.35117
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5438470 0.0136786 39.759 <2e-16 ***
Condition1 0.0263110 0.0312175 0.843 0.401
Condition2 0.0279084 0.0335882 0.831 0.408
Condition3 -0.0007032 0.0276090 -0.025 0.980
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1472 on 112 degrees of freedom
Multiple R-squared: 0.01234, Adjusted R-squared: -0.01412
F-statistic: 0.4663 on 3 and 112 DF, p-value: 0.7064
If I'm understanding this right, my contrasts should be represented by the "Condition1" and "Condition2" coefficients. However, I have no idea what "Condition3" refers to. If I ask R to show me the contrasts directly, it gives me this:
> show(contrasts(PC$Condition))
[,1] [,2] [,3]
1 0.75 0.0000000 8.326673e-17
2 -0.25 -0.3333333 -7.071068e-01
3 -0.25 -0.3333333 7.071068e-01
4 -0.25 0.6666667 -2.498002e-16
Where does the third column come from? Have I done something wrong?
Upvotes: 2
Views: 1247
Reputation: 81743
If you specify the contrasts outside the lm
function, R will automatically use the maximum number of contrasts. In your example, one contrast is added since 4 factor levels allow for 3 orthogonal contrasts.
However, you can use the parameter contrasts
in lm
to override the default behavior. In this case, the specified contrast matrix is used. No additional contrasts are added.
The command:
lm(PC ~ Condition, PC, contrasts = list(Condition = cond.contrasts))
This means that you want to use the contrast matrix cond.contrasts
for the factor Condition
.
Upvotes: 1