python float precision: will this increment work reliably?

Question

I use the following code to dynamically generate a list of dictionaries of every combination of incremental probabilities associated with a given list of items, such that the probabilities sum to 1. For example, if the increment_divisor were 2 (leading to increment of 1/2 or 0.5), and the list contained 3 items ['a', 'b', 'c'], then the function should return

[{'a': 0.5, 'b': 0.5, 'c': 0.0},
 {'a': 0.5, 'b': 0.0, 'c': 0.5},
 {'a': 0.0, 'b': 0.5, 'c': 0.5},
 {'a': 1.0, 'b': 0.0, 'c': 0.0},
 {'a': 0.0, 'b': 1.0, 'c': 0.0},
 {'a': 0.0, 'b': 0.0, 'c': 1.0}]

The code is as follows. The script generates the incrementer by calculating 1/x and then iteratively adds the incrementer to increments until the value is >= 1.0. I already know that python floats are imprecise, but I want to be sure that the last value in increments will be something very close to 1.0.

from collections import OrderedDict
from itertools import permutations

def generate_hyp_space(list_of_items, increment_divisor):
    """Generate list of OrderedDicts filling the hypothesis space.

    Each OrderedDict is of the form ...
    { i1: 0.0, i2: 0.1, i3: 0.0, ...}
    ... where .values() sums to 1.

    Arguments:
    list_of_items     -- items that receive prior weights
    increment_divisor -- Increment by 1/increment_divisor. For example,
                         4 yields (0.0, 0.25, 0.5, 0.75, 1.0).
    """
    _LEN = len(list_of_items)
    if increment_divisor < _LEN:  # permutations() returns None if this is True
        print('WARN: increment_divisor too small, so was reset to '
              'len(list_of_items).', file=sys.stderr)
        increment_divisor = _LEN
    increment_size = 1/increment_divisor
    h_space = []
    increments = []
    incremental = 0.0
    while incremental <= 1.0:
        increments.append(incremental)
        incremental += increment_size
    for p in permutations(increments, _LEN):
        if sum(p) == 1.0:
            h_space.append(OrderedDict([(list_of_items[i], p[i])
                                        for i in range(_LEN)]))
    return h_space

How large can the increment_divisor be before the imprecision of float breaks the reliability of the script? (specifically, while incremental <= 1.0 and if sum(p) == 1.0)

This is a small example, but real use will involve much larger permutation space. Is there a more efficient/effective way to achieve this goal? (I already plan to implement a cache.) Would numpy datatypes be useful here for speed or precision?

user2357112 · Accepted Answer

The script generates the incrementer by calculating 1/x and then iteratively adds the incrementer to increments until the value is >= 1.0.

No, no, no. Just make a list of [0/x, 1/x, ..., (x-1)/x, x/x] by dividing each integer from 0 to x by x:

increments = [i/increment_divisor for i in range(increment_divisor+1)]
# or for Python 2
increments = [1.0*i/increment_divisor for i in xrange(increment_divisor+1)]

The list will always have exactly the right number of elements, no matter what rounding errors occur.

With NumPy, this would be numpy.linspace:

increments = numpy.linspace(start=0, stop=1, num=increment_divisor+1)

As for your overall problem, working in floats at all is probably a bad idea. You should be able to do the whole thing with integers and only divide by increment_divisor at the end, so you don't have to deal with floating-point precision issues in sum(p) == 1.0. Also, itertools.permutations doesn't do what you want, since it doesn't allow repeated items in the same permutation.

Instead of filtering permutations at all, you should use an algorithm based on the stars and bars idea to generate all possible ways to place len(list_of_items) - 1 separators between increment_divisor outcomes, and convert separator placements to probability dicts.

python float precision: will this increment work reliably?

Answers (2)

Related Questions