Reputation: 483
Context: Stacked data, in the format for conducting an ANOVA, is given in R as illustrated in an example subset below (data is not ordered in original):
IV_B1 IV_B2 IV_W DV
1 1 1 12
1 1 2 42
1 2 1 25
1 2 2 29
2 1 1 13
2 1 2 49
2 2 1 45
2 2 2 34
Goal: The goal is to compute a paired t-test with IV_W
as within factor and IV_B1
and IV_B2
as between factors, hence the pairings are defined by IV_W
with constant IV_B1
cross IV_B2
:
Pair one (P1): (IV_B1 = 1, IV_B2 = 1, IV_W = 1), (IV_B1 = 1, IV_B2 = 1, IV_W = 2)
Pair two (P2): (IV_B1 = 1, IV_B2 = 2, IV_W = 1), (IV_B1 = 1, IV_B2 = 2, IV_W = 2)
...
In total:
P1 = [(1, 1, 1), (1, 1, 2)]
, P2 = [(1, 2, 1), (1, 2, 2)]
, P3 = [(2, 1, 1), (2, 1, 2)]
, P4 = [(2, 2, 1), (2, 2, 2)]
, hence in the given case the manual command would be t.test(c(12, 25, 13, 45),c(42, 29, 49, 34), paired=TRUE)
.
Question: How to conduct such paired t-test in R to get the follwoing data:
Upvotes: 1
Views: 270
Reputation: 24272
Here is an alternative solution using reshape
:
df <- structure(list(IV_B1 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), IV_B2 = c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), IV_W = c(1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L), DV = c(12L, 42L, 25L, 29L, 13L, 49L, 45L, 34L)), .Names = c("IV_B1",
"IV_B2", "IV_W", "DV"), class = "data.frame", row.names = c(NA,
-8L))
df
# IV_B1 IV_B2 IV_W DV
# 1 1 1 1 12
# 2 1 1 2 42
# 3 1 2 1 25
# 4 1 2 2 29
# 5 2 1 1 13
# 6 2 1 2 49
# 7 2 2 1 45
# 8 2 2 2 34
# Add an id column
( df <- cbind(df, id=rep(1:(nrow(df)/2),each=2)) )
# IV_B1 IV_B2 IV_W DV id
# 1 1 1 1 12 1
# 2 1 1 2 42 1
# 3 1 2 1 25 2
# 4 1 2 2 29 2
# 5 2 1 1 13 3
# 6 2 1 2 49 3
# 7 2 2 1 45 4
# 8 2 2 2 34 4
# From long to wide format
( df.wide <- reshape(df, idvar="id", v.names=c("IV_B1","IV_B2","DV"),
timevar = "IV_W", direction = "wide") )
# id IV_B1.1 IV_B2.1 DV.1 IV_B1.2 IV_B2.2 DV.2
# 1 1 1 1 12 1 1 42
# 3 2 1 2 25 1 2 29
# 5 3 2 1 13 2 1 49
# 7 4 2 2 45 2 2 34
# Paired t-test
tt <- t.test(df.wide$DV.1,df.wide$DV.2, paired=T)
# Calculate differences
difs <- df.wide$DV.1-df.wide$DV.2
# Mean difference
( mean_diff <- tt$estimate )
# mean of the differences
# -14.75
mean(difs)
# Standard error of the difference
( se_mean_diff <- sd(difs)/sqrt(length(difs)) )
# [1] 11.04064
# T statistic
( T <- tt$statistic )
# t
# -1.335973
mean_diff/se_mean_diff
# Degrees of freedom
( dof <- tt$parameter )
# df
# 3
# t-test p-value
( pv <- tt$p.value )
# [1] 0.2738612
2 * (1 - pt(abs(T), dof))
# 95% confidence intervals
( CI <- tt$conf.int )
# [1] -49.88626 20.38626
# attr(,"conf.level")
# [1] 0.95
c(mean_diff - qt(0.975,dof)*se_mean_diff,
mean_diff + qt(0.975,dof)*se_mean_diff)
Upvotes: 2
Reputation: 3026
P1 = subset(df, (IV_B1 == 1 & IV_B2 == 1 & IV_W == 1) |
(IV_B1 == 1 & IV_B2 == 2 & IV_W == 1) |
(IV_B1 == 2 & IV_B2 == 1 & IV_W == 1) |
(IV_B1 == 2 & IV_B2 == 2 & IV_W == 1))
P1 = P1$DV
P2 = subset(df, (IV_B1 == 1 & IV_B2 == 1 & IV_W == 2) |
(IV_B1 == 1 & IV_B2 == 2 & IV_W == 2) |
(IV_B1 == 2 & IV_B2 == 1 & IV_W == 2) |
(IV_B1 == 2 & IV_B2 == 2 & IV_W == 2))
P2 = P2$DV
TT = t.test(P1, P2, paired=TRUE)
pval = TT$p.value
mdiff = TT$estimate
df = TT$parameter
tval = TT$statistic
Upvotes: 1