Reputation:
I have data like this:
d = (
(701, 1, 0.2),
(701, 2, 0.3),
(701, 3, 0.5),
(702, 1, 0.2),
(702, 2, 0.3),
(703, 3, 0.5)
)
Where (701, 1, 0.2) = (id1, id2, priority)
Is there a pretty way to choose id2 if I know id1, using priority?
Func(701) should return:
1 - in 20% cases
2 - 30%
3 - 50%
Percent will be rough of course
Upvotes: 17
Views: 11141
Reputation: 20141
Generate a Cumulative Distribution Function for each ID1 thus:
cdfs = defaultdict()
for id1,id2,val in d:
prevtotal = cdfs[id1][-1][0]
newtotal = prevtotal + val
cdfs[id1].append( (newtotal,id2) )
So you will have
cdfs = { 701 : [ (0.2,1), (0.5,2), (1.0,3) ],
702 : [ (0.2,1), (0.5,2) ],
703 : [ (0.5,3) ] }
Then generate a random number and search for it in the list.
def func(id1):
max = cdfs[id1][-1][0]
rand = random.random()*max
for upper,id2 in cdfs[id1]:
if upper>rand:
return id2
return None
Upvotes: 7
Reputation: 63719
You can also create a 100-element list for each value, and then let random.choice do the selecting from a seeded list whose members are loaded in the weighting that you want:
import random
from collections import defaultdict
d = (
(701, 1, 0.2),
(701, 2, 0.3),
(701, 3, 0.5),
(702, 1, 0.2),
(702, 2, 0.3),
(702, 3, 0.5)
)
class WeightedLookup(object):
def __init__(self, valueTupleList):
self.valdict = defaultdict(list)
for key, val, prob in valueTupleList:
self.valdict[key] += [val]*(int)(prob*100)
def __getitem__(self,key):
return random.choice(self.valdict[key])
lookup = WeightedLookup(d)
# test out our lookup distribution, sample it 100000 times
res = { 1:0, 2:0, 3:0 }
for i in range(100000):
res[lookup[701]] += 1
# print how many times each value was returned
for k in (1,2,3):
print k, res[k]
Prints:
1 20059
2 30084
3 49857
Upvotes: 0
Reputation: 76067
Two ideas (allow me to illustrate it with separated options and ratios for the sake of clarity in the argument names, if they're packed in a tuple you can save the "zip"):
a) Denormalize the weights to get integer ratios, then put in a list as many copies as the ratio and use random.choice
.
def choice_with_ratios(options, ratios):
tmp = sum([[v]*n for v, n in zip(options, ratios)], [])
return random.choice(tmp)
b) Use the normalized weights and start summing up until you reach a random generated uniform value
def choice_with_weights(options, weights):
s = 0
r = random.random()
for v, w in zip(options, weights):
s += w
if s >= r: break
return v
By the way, if the first field is used as a key, you should have it in a dictionary, like:
d = {
701: ((1, 0.2), (2, 0.3), (3, 0.5),
702: ((1, 0.3), (2, 0.2), (3, 0.5)
}
Upvotes: 0
Reputation: 38346
Realizing that my first answer was quite buggy in its math, I have produced a new idea. I believe the algorithm here is similar to that of several of the other answers, but this implementation seems to qualify for the "pretty" (if that equals simple) requirement of the question:
def func(id):
rnd = random()
sum = 0
for row in d:
if row[0] == id:
sum = sum + row[2]
if rnd < sum:
return row[1]
With the example data from the OP it goes like this:
< 0.2
return the first element< 0.5
return the second element< 1.0
) return the third elementUpvotes: 3
Reputation: 26586
A very quick hack:
import random
d = {
701: [(1,0.2),(2,0.3),(3,0.5)],
702: [(1,0.2),(2,0.3),(3,0.5)]
}
def func(value):
possible_values=d[value]
total=sum(p[-1] for p in possible_values)
random_value=random.random()
prob=possible_values[0][-1]/total
index=1
while index<len(possible_values) and prob<random_value:
prob+=possible_values[index][-1]/total
index+=1
return possible_values[index-1][0]
if __name__=='__main__':
testcases=1000
cnt=[0,0,0]
for case in xrange(testcases):
answer=func(701)
cnt[answer-1]+=1
for i in xrange(3):
print "Got %d %f%% of the time"%(i+1,float(cnt[i])/testcases*100)
It isn't pretty, but it is the first thing that came to mind, and appears to work as expected.
What this does is to get a random value in the interval [0,1) (using random.random()). It then uses whether the random value falls in the intervals [0,0.2),[0.2,0.5) or [0.5,1), to figure out which value to return.
Upvotes: 1
Reputation: 7098
If your percent values will not be more precise than whole percent values, use a random number generator to generate a number 0-99.
Then in your function, use (programmatic) cases to choose the correct number. For example (clean this up):
if 701 if random_num < 20 return 1 else if random number < 50 // ( 20 + 30 ) return 2 else if random number < 100 // ( 20 + 30 + 50 ) return 3 else // error
Upvotes: 1
Reputation: 25533
Use a discrete uniform distribution from the random module over a sufficient number of values, then partition it:
For example, for case 701 use a distribution over 10 values, for 2 values return 1, for another 3, return 2, and for the other 5, return 3.
You can build any distribution using enough uniform distributions :)
Upvotes: 2