Alan
Alan

Reputation: 559

What Significance Test for Quantitative Dataset (Python Pandas)

Suppose I have the following dataframe df where conv_rate = sales / visits:

      theme      visits   sales   conv_rate
0     brazil        34        2        5.9%
1     argentina     18        3       16.7%
2     spain        135       15       11.1%
3     uk            71        6        8.5%
4     france        80        4        5.0%
5     iceland       26        1        3.8%
6     chile        104       11       10.6%
7     italy         47        5       10.6%

# Total visits = 515
# Total sales = 47
# Mean conversion rate = 9.1%

I want to test which countries have a conversion rate which is significantly different to the conversion rate of the population mean (null hypothesis = no difference in conversion rate).

What test would be most suitable here? I believe I need a two-tailed test as the sample conversion rate may be higher or lower than the population mean. However I am unsure whether a t-test or z-test is most appropriate.

From what I've read, z-tests are best for large sample sizes (n>30), while t-tests are best for small sample sizes (n<30). Is this correct? Since some of my samples (e.g. spain) have a larger sample size than others (e.g. argentina), how do I decide which test is most suitable? I want the same test to be run on all rows (samples).

What I'm trying to do here is see which countries have a conversion rate that is 'significantly different' to the null hypothesis. I want to use a significance test to compute a 'test value' for each country (for example below), then compare this value to a threshold value to determine whether that country has a conversion rate which can only be expained by 5%, 1%, 0.1% of the population (therefore giving me high confidence that the difference in conversion rate is 'significant' rather than down to chance).

      theme      visits   sales   conv_rate     value
0     brazil        34        2        5.9%      1.57
1     argentina     18        3       16.7%      4.51
2     spain        135       15       11.1%      3.06
3     uk            71        6        8.5%      2.57
4     france        80        4        5.0%      1.88
5     iceland       26        1        3.8%      1.28
6     chile        104       11       10.6%      3.23
7     italy         47        5       10.6%      2.94

What test would be most suitable for this purpose? And can I construct the test in pandas or should I use scipy?

Upvotes: 3

Views: 253

Answers (1)

StupidWolf
StupidWolf

Reputation: 46898

You can use a binomial test, where you treat conversion as "sales", the number of visits as "trials" and the average rate of success is your mean sales / mean visits:

import pandas as pd
from scipy.stats import binom_test
p = df.sales.sum()/df.visits.sum()
df['p_binom'] = df.apply(lambda x: binom_test(x[2],x[1],p=p),axis=1)
df

    theme   visits  sales   conv_rate   p_binom
0   brazil  34  2   5.9%    0.765868
1   argentina   18  3   16.7%   0.222923
2   spain   135 15  11.1%   0.452636
3   uk  71  6   8.5%    1.000000
4   france  80  4   5.0%    0.245689
5   iceland 26  1   3.8%    0.508992
6   chile   104 11  10.6%   0.607580
7   italy   47  5   10.6%   0.615161

Upvotes: 1

Related Questions