Reputation: 1210
I use the following code to dynamically generate a list of dictionaries of every combination of incremental probabilities associated with a given list of items, such that the probabilities sum to 1. For example, if the increment_divisor
were 2
(leading to increment of 1/2
or 0.5
), and the list contained 3 items ['a', 'b', 'c']
, then the function should return
[{'a': 0.5, 'b': 0.5, 'c': 0.0},
{'a': 0.5, 'b': 0.0, 'c': 0.5},
{'a': 0.0, 'b': 0.5, 'c': 0.5},
{'a': 1.0, 'b': 0.0, 'c': 0.0},
{'a': 0.0, 'b': 1.0, 'c': 0.0},
{'a': 0.0, 'b': 0.0, 'c': 1.0}]
The code is as follows. The script generates the incrementer by calculating 1/x
and then iteratively adds the incrementer to increments
until the value is >= 1.0
. I already know that python float
s are imprecise, but I want to be sure that the last value in increments
will be something very close to 1.0.
from collections import OrderedDict
from itertools import permutations
def generate_hyp_space(list_of_items, increment_divisor):
"""Generate list of OrderedDicts filling the hypothesis space.
Each OrderedDict is of the form ...
{ i1: 0.0, i2: 0.1, i3: 0.0, ...}
... where .values() sums to 1.
Arguments:
list_of_items -- items that receive prior weights
increment_divisor -- Increment by 1/increment_divisor. For example,
4 yields (0.0, 0.25, 0.5, 0.75, 1.0).
"""
_LEN = len(list_of_items)
if increment_divisor < _LEN: # permutations() returns None if this is True
print('WARN: increment_divisor too small, so was reset to '
'len(list_of_items).', file=sys.stderr)
increment_divisor = _LEN
increment_size = 1/increment_divisor
h_space = []
increments = []
incremental = 0.0
while incremental <= 1.0:
increments.append(incremental)
incremental += increment_size
for p in permutations(increments, _LEN):
if sum(p) == 1.0:
h_space.append(OrderedDict([(list_of_items[i], p[i])
for i in range(_LEN)]))
return h_space
How large can the increment_divisor
be before the imprecision of float
breaks the reliability of the script? (specifically, while incremental <= 1.0
and if sum(p) == 1.0
)
This is a small example, but real use will involve much larger permutation space. Is there a more efficient/effective way to achieve this goal? (I already plan to implement a cache.) Would numpy
datatypes be useful here for speed or precision?
Upvotes: 0
Views: 131
Reputation: 1210
Thanks to @user2357112 for...
int
s until the last step.I implemented stars_and_bars
as a generator as follows:
def stars_and_bars(n, k, the_list=[]):
"""Distribute n probability tokens among k endings.
Generator implementation of the stars-and-bars algorithm.
Arguments:
n -- number of probability tokens (stars)
k -- number of endings/bins (bars+1)
"""
if n == 0:
yield the_list + [0]*k
elif k == 1:
yield the_list + [n]
else:
for i in range(n+1):
yield from stars_and_bars(n-i, k-1, the_list+[i])
Upvotes: 0
Reputation: 280310
The script generates the incrementer by calculating
1/x
and then iteratively adds the incrementer toincrements
until the value is>= 1.0
.
No, no, no. Just make a list of [0/x, 1/x, ..., (x-1)/x, x/x]
by dividing each integer from 0 to x by x:
increments = [i/increment_divisor for i in range(increment_divisor+1)]
# or for Python 2
increments = [1.0*i/increment_divisor for i in xrange(increment_divisor+1)]
The list will always have exactly the right number of elements, no matter what rounding errors occur.
With NumPy, this would be numpy.linspace
:
increments = numpy.linspace(start=0, stop=1, num=increment_divisor+1)
As for your overall problem, working in floats at all is probably a bad idea. You should be able to do the whole thing with integers and only divide by increment_divisor
at the end, so you don't have to deal with floating-point precision issues in sum(p) == 1.0
. Also, itertools.permutations
doesn't do what you want, since it doesn't allow repeated items in the same permutation.
Instead of filtering permutations at all, you should use an algorithm based on the stars and bars idea to generate all possible ways to place len(list_of_items) - 1
separators between increment_divisor
outcomes, and convert separator placements to probability dicts.
Upvotes: 3