rrrookie
rrrookie

Reputation: 11

2-sample independent t-test where each of two columns is in different data frame

I need to run a 2-sample independent t-test, comparing Column1 to Column2. But Column1 is in DataframeA, and Column2 is in DataframeB. How should I do this?

Just in case relevant (feel free to ignore): I am a true beginner. My experience with R so far has been limited to running 2-sample matched t-tests within the same data frame by doing the following:

t.test(response ~ Column1, 
data = (Dataframe1 %>% 
gather(key = "Column1", value = "response", "Column1", "Column2")),
paired = TRUE)

Upvotes: 1

Views: 1730

Answers (2)

Álvaro
Álvaro

Reputation: 798

TL;DR


t_test_result = t.test(DataframeA$Column1, DataframeB$Column2, paired=TRUE)

Explanation


If the data is paired, I assume that both dataframes will have the same number of observations (same number of rows). You can check this with nrow(DataframeA) == nrow(DataframeB) .

You can think of each column of a dataframe as a vector (an ordered list of values). The way that you have used t.test is by using a formula (y~x), and you were essentially saying: Given the dataframe specified in data, perform a t test to assess the significance in the difference in means of the variable response between the paired groups in Column1.

Another way of thinking about this is by grabbing the data in data and separating it into two vectors: the vector with observations for the first group of Column1, and the one for the second group. Then, for each vector, you compute the mean and stdev and apply the appropriate formula that will give you the t statistic and hence the p value.

Thus, you can just extract those 2 vectors separately and provide them as arguments to the t.test() function. I hope it was beginner-friendly enough ^^ otherwise let me know


EDIT: a few additions (I was going to reply in the comments but realized I did not have space hehe)

Regarding the what @Ashish did in order to turn it into a Welch's test, I'd say it was to set var.equal = FALSE. The paired parameter controls whether the t-test is run on paired samples or not, and since your data frames have unequal number of rows, I'm suspecting the observations are not matched.

As for the Cohen's d effect size, you can check this stats exchange question, from which I copy the code:

For context, m1 and m2 are the group's means (which you can get with n1 = mean(DataframeA$Column1)), s1 and s2 are the standard deviations (s2 = sd(DataframeB$Column2)) and n1 and n2 the sample sizes (n2 = length(DataframeB$Column2))

lx <- n1- 1 # Number of observations in group 1
ly <- n2- 1 # # Number of observations in group 1

md  <- abs(m1-m2)        ## mean difference (numerator)
csd <- lx * s1^2 + ly * s2^2
csd <- csd/(lx + ly)
csd <- sqrt(csd)                     ## common sd computation

cd  <- md/csd                        ## cohen's d

Upvotes: 3

Ashish Baid
Ashish Baid

Reputation: 533

This should work for you

res = t.test(DataFrameA$Column1, DataFrameB$Column2, alternative = "two.sided", var.equal = FALSE)

Upvotes: 0

Related Questions