Reputation: 559
Suppose I have the following dataframe df
where conv_rate = sales / visits
:
theme visits sales conv_rate
0 brazil 34 2 5.9%
1 argentina 18 3 16.7%
2 spain 135 15 11.1%
3 uk 71 6 8.5%
4 france 80 4 5.0%
5 iceland 26 1 3.8%
6 chile 104 11 10.6%
7 italy 47 5 10.6%
# Total visits = 515
# Total sales = 47
# Mean conversion rate = 9.1%
I want to test which countries have a conversion rate which is significantly different to the conversion rate of the population mean (null hypothesis = no difference in conversion rate).
What test would be most suitable here? I believe I need a two-tailed test as the sample conversion rate may be higher or lower than the population mean. However I am unsure whether a t-test or z-test is most appropriate.
From what I've read, z-tests are best for large sample sizes (n>30), while t-tests are best for small sample sizes (n<30). Is this correct? Since some of my samples (e.g. spain) have a larger sample size than others (e.g. argentina), how do I decide which test is most suitable? I want the same test to be run on all rows (samples).
What I'm trying to do here is see which countries have a conversion rate that is 'significantly different' to the null hypothesis. I want to use a significance test to compute a 'test value' for each country (for example below), then compare this value to a threshold value to determine whether that country has a conversion rate which can only be expained by 5%, 1%, 0.1% of the population (therefore giving me high confidence that the difference in conversion rate is 'significant' rather than down to chance).
theme visits sales conv_rate value
0 brazil 34 2 5.9% 1.57
1 argentina 18 3 16.7% 4.51
2 spain 135 15 11.1% 3.06
3 uk 71 6 8.5% 2.57
4 france 80 4 5.0% 1.88
5 iceland 26 1 3.8% 1.28
6 chile 104 11 10.6% 3.23
7 italy 47 5 10.6% 2.94
What test would be most suitable for this purpose? And can I construct the test in pandas
or should I use scipy
?
Upvotes: 3
Views: 253
Reputation: 46898
You can use a binomial test, where you treat conversion as "sales", the number of visits as "trials" and the average rate of success is your mean sales / mean visits:
import pandas as pd
from scipy.stats import binom_test
p = df.sales.sum()/df.visits.sum()
df['p_binom'] = df.apply(lambda x: binom_test(x[2],x[1],p=p),axis=1)
df
theme visits sales conv_rate p_binom
0 brazil 34 2 5.9% 0.765868
1 argentina 18 3 16.7% 0.222923
2 spain 135 15 11.1% 0.452636
3 uk 71 6 8.5% 1.000000
4 france 80 4 5.0% 0.245689
5 iceland 26 1 3.8% 0.508992
6 chile 104 11 10.6% 0.607580
7 italy 47 5 10.6% 0.615161
Upvotes: 1