A.B. Carroll
A.B. Carroll

Reputation: 2514

Algorithm for probability when looping over a randomly ordered array

The problem is pretty straightforward, I think, by looking at the code. I have a randomized array (the array must be randomized, some code has been excluded because it doesn't pertain to the actual problem, but does require randomization). For each element in the array, there is a "probability" index (described here as the value itself, in $rules) that is suppose to hint that, if other conditions are met (that are removed here for the sake of non-relevancy), the probability that array element will be "triggered" (in this case, that the array element's score will increment by 1)

Consider the code:

<?php
  // Taken from php.net/shuffle user notes
  // Shuffles an array order for the sake of foreach while maintaining
  // key => value associations
  function shuffle_assoc(&$array) {
    $keys = array_keys($array);
    shuffle($keys);
    foreach($keys as $key) {
      $new[$key] = $array[$key];
    }
    return $new;
  }

  $i = 1000000; // How many tests to perform

  // This is my rule list.  Each key is a simple color
  // and each value is a probability represented as a percent
  $rules = array(
    'black' => 20,
    'white' => 10,
    'red' => 40,
    'green' => 5,
    'blue' => 25,
  );

  // Initialize the scores array with all 0's
  // The "outs" will be used when the probability does not
  // occur in any of the rules
  $scores = array('outs' => 0);
  foreach($rules as $k => $v) { 
    $scores[$k] = 0;
  }

  $count = count($rules);

  for($x = 0; $x < $i; $x++) { 
    $rules = shuffle_assoc($rules);

    foreach($rules as $k => $probability) {
      $rand = mt_rand(1,100);
      //$probability = ??; I've tried applying many different operations here to "correct" the probability

      if($rand > $probability) { 
        continue; 
      } else {
        $scores[$k]++;
        continue 2;
      }
    }
    $scores['outs']++;
  }


  foreach($scores as $k => $v) { 
    echo "$k: " . (($v/$i)*100) . "% ($v/$i)\n";
  }

?>

Expected output (pseudo). Note the percentages correspond with the values of $rules

outs: less than 1% (.../1000000)
black: 20% (.../1000000)
white: 10% (.../1000000)
red: 40% (.../1000000)
green: 5% (.../1000000)
blue: 25% (.../1000000)

Example output:

outs: 30.7128% (307128/1000000)
black: 13.2114% (132114/1000000)
white: 6.3381% (63381/1000000)
red: 29.5247% (295247/1000000)
green: 3.1585% (31585/1000000)
blue: 17.0545% (170545/1000000)

Things I've tried & Considerations:

Tell me there is a precise way to calculate this. It's driving me nuts.

Edit: I have a finalized version of my code, with the help from the two answers below, that does this without the need for knowing probability percentages before the loop begins, and no additional or nested loops (which is what I specifically needed, I guess I should of been more direct in that part) .. In the sense of, each iteration, you could be pulling the probability dynamically based on that specific iteration's properties.. All answers here were invaluable, here is my version of the final code: http://pastebin.com/eB3TVP1E

Upvotes: 5

Views: 2242

Answers (2)

Jon Hulka
Jon Hulka

Reputation: 1309

Jack's idea implemented in your code (if the sum of probabilities is >100 this won't work):

php fiddle

<?php
  // Taken from php.net/shuffle user notes
  // Shuffles an array order for the sake of foreach while maintaining
  // key => value associations
  function shuffle_assoc(&$array) {
    $keys = array_keys($array);
    shuffle($keys);
    foreach($keys as $key) {
      $new[$key] = $array[$key];
    }
    return $new;
  }

  $i = 1000000; // How many tests to perform

  // This is my rule list.  Each key is a simple color
  // and each value is a probability represented as a percent
  $rules = array(
    'black' => 20,
    'white' => 10,
    'red' => 40,
    'green' => 5,
    'blue' => 25,
  );

  // Initialize the scores array with all 0's
  // The "outs" will be used when the probability does not
  // occur in any of the rules
  $scores = array('outs' => 0);
  foreach($rules as $k => $v) { 
    $scores[$k] = 0;
  }

  $count = count($rules);
//$limits is what Jack called $rules_norm
$limits=array();
$limit=0;
foreach($rules as $k=>$v)
{
    $limit+=$v;
    $limits[$k]=$limit;
}
  for($x = 0; $x < $i; $x++) { 
      $rand = mt_rand(1,100);
foreach($limits as $k=>$v)
{
    if($v>=$rand)
    {
        $scores[$k]++;
        continue(2);
    }

}
    $scores['outs']++;
  }


  foreach($scores as $k => $v) { 
    echo "$k: " . (($v/$i)*100) . "% ($v/$i)\n";
  }

?>

Upvotes: 2

Jack
Jack

Reputation: 133609

Just normalize the results, accumulate them and then you are done.

What I mean is:

  • sum all probabilities given for every item of the array to get the total (which is 100 in your case but it's easily generalizable)
  • divide every probability for the total

So for example:

$rules = array(
    'black' => 20,
    'white' => 10,
    'red' => 40,
    'green' => 5,
    'blue' => 25,
  );

will be normalized to:

$rules_norm = array(
    'black' => 0.2,
    'white' => 0.1,
    'red' => 0.4,
    'green' => 0.05,
    'blue' => 0.25,
  );
  • now accumulate the result so that for every element in $rules_norm you calculate the sum of all previous elements plus the current one.

So:

$rules_norm = array(
    'black' => 0.2,
    'white' => 0.3,
    'red' => 0.7,
    'green' => 0.75,
    'blue' => 1.0,
  );

Now with this you can just extract a random float number in range [0,1) and choose which elements are increased according to the result: to increment the score of one element just start from the first one in the array and increment the one such that $rand > $rules_norm[k]

Upvotes: 4

Related Questions