Braden Anderson
Braden Anderson

Reputation: 161

Scipy ttest_ind permutation test changed by equal_var parameter?

For scipy.stats.ttest_ind I thought that setting the permutations parameter to any positive number would result in a permutation test being performed. I also thought that if a permutation test was being performed, no assumptions would be made regarding variance of the two populations, therefore the equal_var parameter should be ignored.

I found that the equal_var parameter does change the resulting p-value and test statistic when the permutation test is being used. A simple reproducible example is shown below:

sample1 = [34, 1200, 23, 50, 60, 50, 0, 0, 30, 89, 0, 300, 400, 20, 10, 0]
sample2 = [20, 10, 5, 0, 30, 50, 0, 100, 110, 0, 40, 10, 3, 0]

# equal_var = True
t1, p1 = stats.ttest_ind(a=sample1, b=sample2, permutations=10_000_000)
t2, p2 = stats.ttest_ind(a=sample1, b=sample2, permutations=10_000_000)
t3, p3 = stats.ttest_ind(a=sample1, b=sample2, permutations=10_000_000)

# equal_var = False
t4, p4 = stats.ttest_ind(a=sample1, b=sample2, permutations=10_000_000, equal_var=False)
t5, p5 = stats.ttest_ind(a=sample1, b=sample2, permutations=10_000_000, equal_var=False)
t6, p6 = stats.ttest_ind(a=sample1, b=sample2, permutations=10_000_000, equal_var=False)

The output of running the code above (test statistics and p-values) is:

enter image description here

3 of 3 times the permutation test with equal_var = True returns a p-value of around 0.15, while 3 of 3 times the same permutation test with equal_var = False returns a p-value of around 0.10.

Can anyone please help me understand what is happening here, and why the equal_var parameter is changing the results of the permutation test?

It was my understanding that the permutation test just randomly assigns each data to one of the two groups (because under the null we can do this) and calculates the difference in means. Then repeats the process permutations times. Then at the end calculates a p-value by dividing the number of permutations where the difference in means was as or more extreme than the sample we actually took by the total number of permutations used. Based on this, I am having a hard time understanding why the equal_var parameter would change the p-value as we see in the example above.

Thank you!

Upvotes: 2

Views: 1096

Answers (1)

TMBailey
TMBailey

Reputation: 667

scipy.stats.ttest_ind calculates a sample test statistic (t), and then calculates a sample inferential test (p-value). The parameter permutations affects the inferential test but does not alter the test statistic. The parameter equal_var affects the test statistic regardless of the permutations parameter.

In calculating the test statistic, if equal_var is specified then the test statistic is based on a single estimate of variance calculated across the whole pool of data (both samples). Otherwise the test statistic is based on the sum of variances associated with the separate means of the two samples.

In calculating the inferential test, if permutations is specified then the inferential test is based on an empirical bootstrap distribution of potential test statistics that could result from the same data pool but assigned to sample 1 or sample 2 at random. This inferential test does not make strong assumptions about the populations from which the data samples come.

If permutations is not specified then the inferential test is based on a theoretical distribution of potential test statistics that could arise if the two data samples come from populations which conform to certain parametric assumptions (normal distribution, and so forth). Slightly different theoretical distributions are used depending on the equal_var parameter (because equal_var affects the "degrees of freedom" assumed).

Upvotes: 1

Related Questions