Aditya Singh
Aditya Singh

Reputation: 852

Fixed Proportionate Selection

I have a set of elements and i need to choose any one element out of it. Each element is associated with a percentage chance. The percentages add to 100.

I need to choose one out of those element so that the chances of an element being chosen is equal to the percent value. So if a element has 25% chance, it is supposed to have 25% chances of getting chosen. In other words, if we choose elements 1 mil times, that element should be chosen near 250k times.

Upvotes: 3

Views: 150

Answers (2)

lll
lll

Reputation: 12889

I guess it was faster for me to write it than it was for you to show us what you did so far.

Probably not the best solution, but as it stands, it looks like it's the only one you've got.

Here you go:

$elements = array(
    'This' => 25,
    'is' => 15,
    'a' => 15,
    'crappy' => 20,
    'list' => 25
);

asort($elements);
$elements = array_reverse($elements);

// Precalc cumulative value
$cumulant = 0;
foreach ($elements as $key => &$value) {
    $cumulant += $value;
    $value = $cumulant;
}

function pickAnElement($elements) {
    $random = rand(1, 100);
    foreach ($elements as $key => $value) {
        if ($random <= $value) {
            return $key;
        }
    }
}

$picks = array();

for ($i = 0; $i < 10000; $i++) {
    $element = pickAnElement($elements);
    if (!array_key_exists($element, $picks)) {
        $picks[$element] = 0;
    }
    $picks[$element]++;
}

var_dump($picks);

Inspired by Johans answer, I added a loop to sort and pre-calculate the cumulant.

Upvotes: 1

Johan Lundberg
Johan Lundberg

Reputation: 27038

What you describe is a multinomial process.

http://en.wikipedia.org/wiki/Multinomial_distribution#Sampling_from_a_multinomial_distribution

They way to generate such random process is like this: ( I'll use pseudo code but it should be easy to make it in to real code. )

  1. Sort the 'boxes' in reverse order of their probability: (not needed. it's just an optimization) so that you have for example values=[0.45,0.3,0.15,0.1]

  2. then create the 'cumulative' distribution, which is the sum of all elements with index <=i. pseudocode:

    cumulant=[0,0,0,0]    // initiate it
    s=0
    for j=0 to size()-1 {
       s=s+values[i] ; 
       cumulant[i]=s
    }
    

    in our case cumulant=[0.45,0.70,0.85 ,1 ]

  3. make a uniform random number x between 0 and 1. For php: http://php.net/manual/en/function.rand.php

  4. the resulting random box index i is

    the highest i for which cumulant[i]< x

pseudocode:

for j=0 to size()-1 {
  if !(cumulant[i]<){
     print "your index is ",i
     break;
  }

that is it. Get another random index i by going back to point 3.

if you sort like suggested above, that means that the final search will be faster. For example, if you have this vector of probabilities: 0.001 0.001 0.001 0.001 0.996 then, when you sort it, you will almost always only have to look only at index i=0, since the random number x will almost always be lower than 0.996. If the sort pays off or not depends on if you repeatedly use the same 'boxes'. So, yes with 250k tries it will help a lot. Just remember that the box index i you get is for the sorted vector.

Upvotes: 5

Related Questions