Reputation: 11
I need to run a 2-sample independent t-test, comparing Column1 to Column2. But Column1 is in DataframeA, and Column2 is in DataframeB. How should I do this?
Just in case relevant (feel free to ignore): I am a true beginner. My experience with R so far has been limited to running 2-sample matched t-tests within the same data frame by doing the following:
t.test(response ~ Column1,
data = (Dataframe1 %>%
gather(key = "Column1", value = "response", "Column1", "Column2")),
paired = TRUE)
Upvotes: 1
Views: 1730
Reputation: 798
t_test_result = t.test(DataframeA$Column1, DataframeB$Column2, paired=TRUE)
If the data is paired, I assume that both dataframes will have the same number of observations (same number of rows). You can check this with nrow(DataframeA) == nrow(DataframeB)
.
You can think of each column of a dataframe as a vector (an ordered list of values). The way that you have used t.test
is by using a formula (y~x
), and you were essentially saying: Given the dataframe specified in data
, perform a t test to assess the significance in the difference in means of the variable response
between the paired groups in Column1
.
Another way of thinking about this is by grabbing the data in data
and separating it into two vectors: the vector with observations for the first group of Column1
, and the one for the second group. Then, for each vector, you compute the mean and stdev and apply the appropriate formula that will give you the t statistic and hence the p value.
Thus, you can just extract those 2 vectors separately and provide them as arguments to the t.test()
function. I hope it was beginner-friendly enough ^^ otherwise let me know
EDIT: a few additions (I was going to reply in the comments but realized I did not have space hehe)
Regarding the what @Ashish did in order to turn it into a Welch's test, I'd say it was to set var.equal = FALSE
. The paired
parameter controls whether the t-test is run on paired samples or not, and since your data frames have unequal number of rows, I'm suspecting the observations are not matched.
As for the Cohen's d effect size, you can check this stats exchange question, from which I copy the code:
For context, m1 and m2 are the group's means (which you can get with n1 = mean(DataframeA$Column1)
), s1 and s2 are the standard deviations (s2 = sd(DataframeB$Column2)
) and n1 and n2 the sample sizes (n2 = length(DataframeB$Column2)
)
lx <- n1- 1 # Number of observations in group 1
ly <- n2- 1 # # Number of observations in group 1
md <- abs(m1-m2) ## mean difference (numerator)
csd <- lx * s1^2 + ly * s2^2
csd <- csd/(lx + ly)
csd <- sqrt(csd) ## common sd computation
cd <- md/csd ## cohen's d
Upvotes: 3
Reputation: 533
This should work for you
res = t.test(DataFrameA$Column1, DataFrameB$Column2, alternative = "two.sided", var.equal = FALSE)
Upvotes: 0