Eran
Eran

Reputation: 1702

Unit Testing probability

I have a method that creates a 2 different instances (M, N) in a given x of times (math.random * x) the method will create object M and the rest of times object N.

I have written unit-tests with mocking the random number so I can assure that the method behaves as expected. However I am not sure on how to (and if) to test that the probability is accurate, for example if x = 0.1 I expect 1 out of 10 cases to return instance M.

How do I test this functionality?

Upvotes: 3

Views: 2644

Answers (2)

Aaron Hall
Aaron Hall

Reputation: 395085

I'll do this in the form of Python.

First describe your functionality:

def binomial_process(x):
    '''
    given a probability, x, return M with that probability, 
    else return N with probability 1-x 
    maybe: return random.random() > x
    '''

Then test for this functionality:

import random
def binom(x):
    return random.random() > x

Then write your test functions, first a setup function to put together your data from an expensive process:

def setUp(x, n):
    counter = dict()
    for _ in range(n):
        result = binom(x)
        counter[result] = counter.get(result, 0) + 1
    return counter

Then the actual test:

import scipy.stats
trials = 1000000


def test_binomial_process():

    ps = (.01, .1, .33, .5, .66, .9, .99)
    x_01 = setUp(.01, trials)
    x_1 = setUp(.1, trials)
    x_33 = setUp(.1, trials)
    x_5 = setUp(.5, trials)
    x_66 = setUp(.9, trials)
    x_9 = setUp(.9, trials)
    x_99 = setUp(.99, trials)
    x_01_result = scipy.stats.binom_test(x_01.get(True, 0), trials, .01)
    x_1_result = scipy.stats.binom_test(x_1.get(True, 0), trials, .1)
    x_33_result = scipy.stats.binom_test(x_33.get(True, 0), trials, .33)
    x_5_result = scipy.stats.binom_test(x_5.get(True, 0), trials)
    x_66_result = scipy.stats.binom_test(x_66.get(True, 0), trials, .66)
    x_9_result = scipy.stats.binom_test(x_9.get(True, 0), trials, .9)
    x_99_result = scipy.stats.binom_test(x_99.get(True, 0), trials, .99)
    setups = (x_01, x_1, x_33, x_5, x_66,  x_9, x_99)
    results = (x_01_result, x_1_result, x_33_result, x_5_result,
               x_66_result, x_9_result, x_99_result)
    print 'can reject the hypothesis that the following tests are NOT the'
    print 'results of a binomial process (with their given respective'
    print 'probabilities) with probability < .01, {0} trials each'.format(trials)
    for p, setup, result in zip(ps, setups, results):
        print 'p = {0}'.format(p), setup, result, 'reject null' if result < .01 else 'fail to reject'

Then write your function (ok, we already did):

def binom(x):
    return random.random() > x

And run your tests:

test_binomial_process()

Which on last output gives me:

can reject the hypothesis that the following tests are NOT the
results of a binomial process (with their given respective
probabilities) with probability < .01, 1000000 trials each
p = 0.01 {False: 10084, True: 989916} 4.94065645841e-324 reject null
p = 0.1 {False: 100524, True: 899476} 1.48219693752e-323 reject null
p = 0.33 {False: 100633, True: 899367} 2.96439387505e-323 reject null
p = 0.5 {False: 500369, True: 499631} 0.461122365668 fail to reject
p = 0.66 {False: 900144, True: 99856} 2.96439387505e-323 reject null
p = 0.9 {False: 899988, True: 100012} 1.48219693752e-323 reject null
p = 0.99 {False: 989950, True: 10050} 4.94065645841e-324 reject null

Why do we fail to reject on p=0.5? Let's look at the help on scipy.stats.binom_test:

Help on function binom_test in module scipy.stats.morestats:

binom_test(x, n=None, p=0.5, alternative='two-sided')
    Perform a test that the probability of success is p.

    This is an exact, two-sided test of the null hypothesis
    that the probability of success in a Bernoulli experiment
    is `p`.

    Parameters
    ----------
    x : integer or array_like
        the number of successes, or if x has length 2, it is the
        number of successes and the number of failures.
    n : integer
        the number of trials.  This is ignored if x gives both the
        number of successes and failures
    p : float, optional
        The hypothesized probability of success.  0 <= p <= 1. The
        default value is p = 0.5
    alternative : {'two-sided', 'greater', 'less'}, optional
        Indicates the alternative hypothesis. The default value is
        'two-sided'.

So .5 is the default null hypothesis for test, and it makes sense not to reject the null hypothesis in this case.

Upvotes: 1

Aaron Digulla
Aaron Digulla

Reputation: 328614

Split the test. The first test should allow you to define what the random number generator returns (I assume you already have that). This part of the test just satisfies the "do I get the expected result if the random number generator would return some value".

The second test should just run the random number generator using some statistical analysis function (like counting how often it returns each value).

I suggest to wrap the real generator with a wrapper that returns "create M" and "create N" (or possibly just 0 and 1). That way, you can separate implementation from the place where it's used (the code which creates the two different instance shouldn't need to know how the generator is initialized or how you turn the real result into "create X".

Upvotes: 1

Related Questions