hossein karimi
hossein karimi

Reputation: 31

Select row at random based on the "weight" of that row

I have a table like this:

ID chance
1 1
2 2
3 4
4 1

Now I need to choose a rand() from this table

SELECT * FROM table
ORDER BY RAND()
LIMIT 1

But ID #2 has double the chances to be selected compared to ID #1 AND 4. Likewise ID #3 has four times the chances to be selected compared to ID #1 AND 4.

Somewhat similar to lottery based on chance.

Upvotes: 3

Views: 157

Answers (3)

ABelikov
ABelikov

Reputation: 612

If you need in clear MySQL solution you can use this:

SELECT id FROM `table` ORDER BY -LOG(1-RAND())/chance LIMIT 1

Here is about choosing a random number from the exponential distribution http://www.tushar-mehta.com/publish_train/xl_vba_cases/0806%20generate%20random%20numbers.shtml

Simple code "just for test"

$sql = "SELECT id FROM `table` ORDER BY -LOG(1-RAND())/chance LIMIT 1";
$Res=array();
for ($i=0;$i<10000;$i++) {
    $result = mysqli_query($db,$sql);
    $row=mysqli_fetch_array($result, MYSQLI_ASSOC);
    if (isset($row['id'])) {
       echo "$i. => ".($row['id'])."\n";
       if (!isset($Res[$row['id']])) $Res[$row['id']]=0;
       $Res[$row['id']]++;
    } else {
        echo ' error.432 ';exit;
    }
}

print_r($Res);

You will see that "2" twice more often than "4" or "1". And "3" twice more often than "2"

Upvotes: 0

lolbas
lolbas

Reputation: 794

This is how lottery works in some of the games. Given table similar to your example (say, we also have chance column that indicated value-based possibility of getting this particular reward), the algorithm is:

  1. Calculate total lottery value (in your example it is 1 + 2 + 4 + 1 = 8).
  2. Generate a value that is from range 1..max (max is 8 in current example) inclusive.
  3. Iterate over all items from rewards list to find the one item where sum of all previous chances is more than generated number but less or equal than

Say, we have generated number 5. Our comparison steps are:

  1. 0 < 5 <= (0) + 1 is false so ID1 is not what we get. Left side is 0 because we start our calculations from 0.
  2. 1 < 5 <= (1) + 2 is false so ID2 is not what we get.
  3. 1 + 2 < 5 <= (1 + 2) + 4 is true so we get ID3.

Example in JavaScript:

var rewards = [
  { id: 1, chance: 1 },
  { id: 2, chance: 2 },
  { id: 3, chance: 4 },
  { id: 4, chance: 1 }
];

function getRandomInt(min, max) {
  return Math.floor(Math.random() * (max - min + 1)) + min;
}

function generate() {
  var sum = 0;
  var next_sum = 0;
  var random = getRandomInt(1, rewards.reduce(function(pv, cv) {
    return pv + cv.chance;
  }, 0));

  for (var i = 0; i < rewards.length; i++) {
    next_sum = sum + rewards[i].chance;
    if ((random > sum) && (random <= next_sum)) {
      return rewards[i].id;
    }
    sum += rewards[i].chance;
  }
}

var winnerCounts = {}, i, winner;
for (i = 0; i < 8000; i++) {
  winner = generate();
  winnerCounts[winner] = (winnerCounts[winner] || 0) + 1;
}
console.log("Number of times each id was selected after %d itrations", i);
console.log(winnerCounts);

Upvotes: 2

Valdrinium
Valdrinium

Reputation: 1416

Here is SQL Fiddle with a MySQL-only solution

select * from (
  select id, @running_total as previous_total, @running_total := @running_total + chance AS running_total, until.rand
  from (
    select round(rand() * init.max) as rand from (
      select sum(chance) - 1 as max from demo
    ) as init
  ) as until,
  demo,
  ( select @running_total := 0.00 ) as vars
) as results
where results.rand >= results.previous_total and results.rand < results.running_total

The algorithm is as follows:

  1. Find out the sum of all the chances and store it in max
  2. Generate a random number in the interval [0, max)
  3. For every row, compute a previous_total (initially 0) and a current_total of the chances encountered so far
  4. Keep only the row where the generated number is in the interval [previous_total, current_total)

Because we have an even chance to pick each number in the interval [0, sum_of_all_chances), then we can asign every entry as many numbers in this interval as it has chances to get picked, ensuring an even distribution.

@running_total is just a MySQL variable and I have used ( select @running_total := 0.00 ) as vars only as a way to give it an initial value. Also, I have used ( select round(rand() * init.max) as rand from ( select sum(chance) - 1 as max from demo ) as init ) as until just as a way to sum up the chances and store the random number generated by MySQL's rand function. Hope this makes the code easy to understand.

Upvotes: 2

Related Questions