Reputation: 175
I have two dataframes that I would like to do t.test on the matching columns. Both dataframes are subsets of a big dataframe so all colnames are the same and matched (ncol= ~20000) and nrow(df1)=25 and nrow(df2)=23.
Example:
treatment<-matrix(rnorm(50), ncol=10)
control<-matrix(rnorm(50), ncol=10)
treatment
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.23442246 1.02256703 1.0499998 0.2913643 -1.2083822 0.3778403
[2,] -0.68888047 -0.03961717 -0.9978793 -0.9792061 -0.1831634 0.6140542
[3,] -1.88273887 -0.49701513 0.1845197 0.4385338 1.2249121 0.5444027
[4,] 1.21359446 0.87333933 0.5615304 0.3803339 1.1294489 -0.8777454
[5,] -0.02908159 -1.50296138 0.4624656 0.1335046 1.1665818 -0.4475185
[,7] [,8] [,9] [,10]
[1,] 0.5987723 0.5910937 0.4334874 -1.4198250
[2,] 0.2027346 0.8078187 -1.0573069 1.0727554
[3,] 0.5490159 0.5109912 1.7247428 1.7745333
[4,] 0.3044544 0.6476548 1.1959365 -0.1220841
[5,] 1.8681375 0.8451147 0.4283893 0.1044125
control
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.6712834 -0.3775649 0.7741285 0.51224345 0.24128336 1.02580198
[2,] 0.3894112 -0.1835289 0.4982122 1.73512459 0.08991013 -0.04406897
[3,] 1.7068503 0.7909355 -0.3341426 0.08780239 -1.11563321 2.09984105
[4,] -0.7634818 -1.3672888 0.2161816 -0.65170516 0.81247509 1.68008404
[5,] 0.5787616 0.1704100 -0.3166737 0.90167409 -2.34854292 0.31571255
[,7] [,8] [,9] [,10]
[1,] -1.6111883 0.1019497 -0.1975491 -0.3776000
[2,] 0.7533329 1.1540590 1.0050663 2.0137347
[3,] 1.2224161 1.4411853 -0.4801494 -0.3891034
[4,] 0.1905461 0.9767801 -0.1442578 -0.9946735
[5,] -1.9581454 -0.2874181 -1.0421440 -0.6177782
I did some searching on SO and came across mapply():
mapply(t.test,treatment,control)
Error in t.test.default(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
not enough 'x' observations
But when I do t.test on single columns:
t.test(treatment[,1],control[,1])
Welch Two Sample t-test
data: treatment[, 1] and control[, 1]
t = -1.1541, df = 7.492, p-value = 0.284
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.2577187 0.7635152
sample estimates:
mean of x mean of y
-0.2305368 0.5165649
What is wrong here?
Upvotes: 1
Views: 892
Reputation: 93938
treatment
and control
, as matrix
objects, are essentially a vector
(like c(1,2,3)
) and thus mapply
tries to run a t.test
comparing each individual number. E.g.:
treatment[1]
#[1] 0.7545039
control[1]
#[1] -0.3926361
t.test(treatment[1],control[1])
#Error in t.test.default(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
# not enough 'x' observations
If you convert your matrices to data.frame
objects, each column will be treated as a single object and mapply
will work just fine:
mapply(t.test,as.data.frame(treatment),as.data.frame(control))
# V1
#statistic -0.7829546
#parameter 7.698139
#p.value 0.4570611
#etc etc
In this case, I'm nearly sure using Map
is more appropriate for readability's sake:
Map(t.test,as.data.frame(treatment),as.data.frame(control))
#$V1
#
# Welch Two Sample t-test
#
#data: dots[[1L]][[1L]] and dots[[2L]][[1L]]
#t = -0.783, df = 7.698, p-value = 0.4571
#alternative hypothesis: true difference in means is not equal to 0
#95 percent confidence interval:
# -1.525349 0.756036
#sample estimates:
# mean of x mean of y
#-0.31246928 0.07218723
Upvotes: 2