ampersander
ampersander

Reputation: 288

P-value in scipy.stats does not reflect the reality

I am not sure whether this is a question for Stack Overflow or for Math Stack Exchange.

I have data about the cost of crashes of cars A, and the data about the cost of crashes of cars B.

There were 15 992 crashes of type B, with the total cost of 19 890 980. Average cost of a crash of cars B was 1541.808.

Then, there were 2760 crashes of type A with the total cost of 4 255 390. The average cost of a crash of cars A was 1243.808.

It is apparent that the mean of the cost of crashes of cars A should be lower than the one of cars B. I want to test this using a t-test. The null hypothesis is "The means are equal". The alpha is 5%.

However, when I run the following in python

ttest_ind(table[B], table2[A],  alternative="less",equal_var=False)

The result I get is this: (and the p value would indicate that mean of the cost of the crash of cars B is NOT less than the mean of A, which does not make sense).

Ttest_indResult(statistic=3.417269886834147, pvalue=0.9996071028578007)

If I, however, run this (without the alternative)

ttest_ind(table[B], table2[A], equal_var=False)

I get

Ttest_indResult(statistic=3.417269886834147, pvalue=0.0007857942843984687)

Why does the first function which uses "alternative" produce the weirdly high p-value? Is there something I understand incorrectly about the p-values?

Upvotes: 0

Views: 607

Answers (1)

foglerit
foglerit

Reputation: 8279

You have your sample order inverted. Use instead:

ttest_ind(table[A], table2[B],  alternative="less", equal_var=False)

From the docs, under the alternative argument:

‘less’: the mean of the distribution underlying the first sample is less than the mean of the distribution underlying the second sample.

Upvotes: 1

Related Questions