user251707
user251707

Reputation:

Random weighted choice

I have data like this:

d = (
  (701, 1, 0.2),
  (701, 2, 0.3),
  (701, 3, 0.5),
  (702, 1, 0.2),
  (702, 2, 0.3),
  (703, 3, 0.5)
)

Where (701, 1, 0.2) = (id1, id2, priority)

Is there a pretty way to choose id2 if I know id1, using priority?

Func(701) should return:
  1 - in 20% cases
  2 - 30%
  3 - 50%

Percent will be rough of course

Upvotes: 17

Views: 11141

Answers (7)

Phil H
Phil H

Reputation: 20141

Generate a Cumulative Distribution Function for each ID1 thus:

cdfs = defaultdict()
for id1,id2,val in d:
    prevtotal = cdfs[id1][-1][0]
    newtotal = prevtotal + val
    cdfs[id1].append( (newtotal,id2) )

So you will have

cdfs = { 701 : [ (0.2,1), (0.5,2), (1.0,3) ], 
         702 : [ (0.2,1), (0.5,2) ],
         703 : [ (0.5,3) ] }

Then generate a random number and search for it in the list.

def func(id1):
    max = cdfs[id1][-1][0]
    rand = random.random()*max
    for upper,id2 in cdfs[id1]:
        if upper>rand:
            return id2
    return None

Upvotes: 7

PaulMcG
PaulMcG

Reputation: 63719

You can also create a 100-element list for each value, and then let random.choice do the selecting from a seeded list whose members are loaded in the weighting that you want:

import random
from collections import defaultdict

d = ( 
  (701, 1, 0.2), 
  (701, 2, 0.3), 
  (701, 3, 0.5), 
  (702, 1, 0.2), 
  (702, 2, 0.3), 
  (702, 3, 0.5) 
) 

class WeightedLookup(object):
    def __init__(self, valueTupleList):
        self.valdict = defaultdict(list)
        for key, val, prob in valueTupleList:
            self.valdict[key] += [val]*(int)(prob*100)

    def __getitem__(self,key):
        return random.choice(self.valdict[key])


lookup = WeightedLookup(d)

# test out our lookup distribution, sample it 100000 times
res = { 1:0, 2:0, 3:0 }
for i in range(100000):
    res[lookup[701]] += 1

# print how many times each value was returned
for k in (1,2,3):
    print k, res[k]

Prints:

1 20059
2 30084
3 49857

Upvotes: 0

fortran
fortran

Reputation: 76067

Two ideas (allow me to illustrate it with separated options and ratios for the sake of clarity in the argument names, if they're packed in a tuple you can save the "zip"):

a) Denormalize the weights to get integer ratios, then put in a list as many copies as the ratio and use random.choice.

def choice_with_ratios(options, ratios):
    tmp = sum([[v]*n for v, n in zip(options, ratios)], [])
    return random.choice(tmp)

b) Use the normalized weights and start summing up until you reach a random generated uniform value

def choice_with_weights(options, weights):
    s = 0
    r = random.random()
    for v, w in zip(options, weights):
        s += w
        if s >= r: break
    return v

By the way, if the first field is used as a key, you should have it in a dictionary, like:

d = {
  701: ((1, 0.2), (2, 0.3), (3, 0.5),
  702: ((1, 0.3), (2, 0.2), (3, 0.5)
}

Upvotes: 0

Jørn Schou-Rode
Jørn Schou-Rode

Reputation: 38346

Realizing that my first answer was quite buggy in its math, I have produced a new idea. I believe the algorithm here is similar to that of several of the other answers, but this implementation seems to qualify for the "pretty" (if that equals simple) requirement of the question:

def func(id):
    rnd = random()
    sum = 0
    for row in d:
        if row[0] == id:
            sum = sum + row[2]
            if rnd < sum:
                return row[1]

With the example data from the OP it goes like this:

  • Pick a random number between 0 and 1.0
  • If the number is < 0.2 return the first element
  • Else if the number is < 0.5 return the second element
  • Else (if the number is < 1.0) return the third element

Upvotes: 3

MAK
MAK

Reputation: 26586

A very quick hack:

import random

d = {
    701: [(1,0.2),(2,0.3),(3,0.5)],
    702: [(1,0.2),(2,0.3),(3,0.5)]
}

def func(value):
    possible_values=d[value]
    total=sum(p[-1] for p in possible_values)
    random_value=random.random()
    prob=possible_values[0][-1]/total
    index=1
    while index<len(possible_values) and prob<random_value:
        prob+=possible_values[index][-1]/total
        index+=1
    return possible_values[index-1][0]

if __name__=='__main__':
    testcases=1000
    cnt=[0,0,0]
    for case in xrange(testcases):
        answer=func(701)
        cnt[answer-1]+=1
    for i in xrange(3):
        print "Got %d %f%% of the time"%(i+1,float(cnt[i])/testcases*100)

It isn't pretty, but it is the first thing that came to mind, and appears to work as expected.

What this does is to get a random value in the interval [0,1) (using random.random()). It then uses whether the random value falls in the intervals [0,0.2),[0.2,0.5) or [0.5,1), to figure out which value to return.

Upvotes: 1

Aaron
Aaron

Reputation: 7098

If your percent values will not be more precise than whole percent values, use a random number generator to generate a number 0-99.

Then in your function, use (programmatic) cases to choose the correct number. For example (clean this up):

if 701
  if random_num < 20
    return 1
  else if random number < 50   // ( 20 + 30 )
    return 2
  else if random number < 100  // ( 20 + 30 + 50 )
    return 3
  else
    // error

Upvotes: 1

James
James

Reputation: 25533

Use a discrete uniform distribution from the random module over a sufficient number of values, then partition it:

For example, for case 701 use a distribution over 10 values, for 2 values return 1, for another 3, return 2, and for the other 5, return 3.

You can build any distribution using enough uniform distributions :)

Upvotes: 2

Related Questions