Omar Sobhy
Omar Sobhy

Reputation: 29

I am trying to calculate the p_value using the stats model

this is my code to get the numbers needed

import statsmodels.api as sm
from statsmodels.stats.proportion import proportions_ztest

convert_old = len(df2[df2['group'] == 'control']['converted'] == 1)
convert_new = len(df2[df2['group'] == 'treatment']['converted'] == 1)
n_old =  len(df2[df2['group'] == 'control'])
n_new = len(df2[df2['group'] == 'treatment'])

the actual model is:

stat, pval = proportions_ztest([convert_new ,convert_old], [n_new, n_old])

and I am getting this result:

pvalue is : nan

and I am also getting a warning:

/opt/conda/lib/python3.6/site-packages/statsmodels/stats/weightstats.py:670:

 RuntimeWarning: invalid value encountered in double_scalars 
      zstat = value / std_diff

/opt/conda/lib/python3.6/site-packages/statsmodels/stats/weightstats.py:672:

 RuntimeWarning: invalid value encountered in absolute
      pvalue = stats.norm.sf(np.abs(zstat))*2 

Upvotes: 1

Views: 242

Answers (1)

Celius Stingher
Celius Stingher

Reputation: 18367

I believe that the issue is in how you get the numbers for convert_old and convert_new. By setting ['converted'] == 1 you will get a Series with True/False according to each individual value, therefore the length will be unaffected and you will always have the same. In order to get the proper length you can try:

convert_old = len(df2[(df2['group'] == 'control') & (df2['converted'] == 1)]
convert_new = len(df2[(df2['group'] == 'treatment') & (df2['converted'] == 1)]

Upvotes: 1

Related Questions