Reputation: 475
Let's say I have a dictionary like the following, where values are probability for each key to show up in a text.
dict = {'a':0.66,'b':0.07,'c':0.04 and so on so the values of the dict sum up to one}
say that I want to build another dictionary that has the range of those values has value. Since we cannot use range() with floats I have tried to first multiply all the values by 100, so they turn into int. Suppose that we want to substitute those values with their range. So for example 'a' will get a range(0,66), 'b' range(66,73), 'c'(73,77) etc. I have tried to do that with following loop but it doesn't work:
start = 0
end = 0
for k,v in dict.items():
end+=int(v*100)
range_dict[k]=range(start,end)
start+=end
Can somebody please help me? I am going nuts figuring out what to do!
Upvotes: 0
Views: 9213
Reputation: 177600
Stolen with pride from the Python 3.3.0 documentation:
random - 9.6.2. Examples and Recipes - contains a weighted distribution algorithm.
itertools.accumulate - contains the accumulate algorithm.
The code below is written for 2.X:
import random
import bisect
D = {'a':0.66,'b':0.07,'c':0.04,'d':0.20,'e':0.03}
# This function is in Python 3.2+ itertools module.
def accumulate(iterable):
'Return running totals'
# accumulate([1,2,3,4,5]) --> 1 3 6 10 15
it = iter(iterable)
total = next(it)
yield total
for element in it:
total = total + element
yield total
# Extract the weights and build a cumulative distribution.
choices, weights = zip(*D.items())
cumdist = list(accumulate(weights))
# Make 1000 random selections
L = [choices[bisect.bisect(cumdist, random.random() * cumdist[-1])]
for _ in xrange(1000)]
# Display the results
for c in sorted(D.keys()):
print '{} {:3d}'.format(c,L.count(c))
Output:
a 652
b 72
c 43
d 200
e 33
Upvotes: 0
Reputation: 353039
If you change
start += end
to
start = end
It should work (using xrange
here to make it more visible):
>>> d = {'a':0.66,'b':0.07,'c':0.04}
>>> start = 0
>>> end = 0
>>> range_dict = {}
>>> for k,v in d.items():
... end+=int(v*100)
... range_dict[k]=xrange(start,end)
... start=end
...
>>> range_dict
{'a': xrange(66), 'c': xrange(66, 70), 'b': xrange(70, 77)}
But if as @Satoru.Logic guessed you want a weighted random number, there are much better ways. Eli Bendersky has a good overview of approaches in Python here.
Upvotes: 4