Reputation: 111
I have found different methods of conducting a chi-square test for A/B testing looking at users vs conversion rate of a control and a test group.
The first method uses statsmodels
and uses proportions_chisquare
The second method uses scipy
and chi2_contingency
It seems that chi2_contingency
always has a higher value that proportions. Any idea for the difference and which test is more applicable for a simple A/B test?
I apologize for not including an example here is one below:
Example1 (p-value = 0.037):
import statsmodels.stats.proportion as proportion
import numpy as np
conv_a = 20
conv_b = 35
clicks_a = 500
clicks_b = 500
converted = np.array([conv_a, conv_b])
clicks = np.array([clicks_a,clicks_b])
chisq, pvalue, table = proportion.proportions_chisquare(converted, clicks)
print('Results are ','chisq =%.3f, pvalue = %.3f'%(chisq, pvalue))
Example 2 (p-value = 0.0521):
import numpy
import scipy.stats
control_size = 500
A_CONVERSIONS = 20
A_NO_CONVERSIONS= control_size - A_CONVERSIONS
test_size = 500
B_CONVERSIONS = 35
B_NO_CONVERSIONS = test_size - B_CONVERSIONS
data = numpy.array([[A_NO_CONVERSIONS, A_CONVERSIONS],
[B_NO_CONVERSIONS, B_CONVERSIONS]])
chi_square, p_value = scipy.stats.chi2_contingency(data)[:2]
print('χ²: %.4f' % chi_square)
print('p-value: %.4f' % p_value)
Upvotes: 0
Views: 5300
Reputation: 50698
Further to my comment above, here is a reproducible minimal example showing the use of proportions_chisquare
from statsmodels
and chi2_contingency
from scipy
. As expected, results agree.
Let's generate some sample data; data are taken from Fleiss JL, Statistical methods for rates and proportions, New York: John Wiley & Sons (1981).
import pandas as pd
data = pd.DataFrame({
"Smokers": [83, 90, 129, 70],
"Patients": [86, 93, 136, 82]
})
Results from both tests are given below
import statsmodels.stats.proportion as ssp
(chi2, p, arr) = ssp.proportions_chisquare(count = data.Smokers, nobs = data.sum(axis = 1))
"chi2 = %4.2f, p-value = %4.3f" % (chi2, p)
#'chi2 = 0.42, p-value = 0.936'
import scipy.stats as ss
(chi2, p, df, arr) = ss.chi2_contingency(data, correction = False)
"chi2 = %4.2f, p-value = %4.3f" % (chi2, p)
#'chi2 = 0.42, p-value = 0.936'
As to the difference between a chi-square test and z-test (test of equal proportions), I refer to an excellent post on Cross Validated.
Upvotes: 6