Andrey
Andrey

Reputation: 1496

Rate (probability) based selection

Let's suppose I have a structured array like that:

[
  'A' => 2,
  'B' => 0,
  'C' => 0,
  'D' => 1,
  'E' => 1,
  'F' => 0
]

I'll refer to this structure as "categories", so, I have six categories in this array. And my goal is to pick a random product based in a category.

I want to do a rate-based category selection, and as I know, I have to calculate how many percent this category represents in array, for example:

<?php

// ...

$total = array_sum($a);

array_map(function ($hits) use ($total) {
  return $hits / $total;
}, ...);

This will give me something like that:

(
    [A] => 0.5  (50%)
    [B] => 0
    [C] => 0
    [D] => 0.25 (25%)
    [E] => 0.25 (25%)
    [F] => 0
)

Okay, now I have to do a simple algorithm to get the category based on those rates; I think I need now to pick a random number between range (0, 1), and make some "slices", for example:

0    .. 0.50   => A
0.50 .. 0.25   => D
0.75 .. 1      => E

And if the random number is between 0 and 0.50, I will pick category A, if between 0.50 and 0.75 then D, if between 0.75 and 1 then E, of course, is what I'm doing right now.

The problem

If I go this way, I'm totally saying mathematically and logically that I'll never get B, C nor F, because there is no hits on those categories (no slices then.)

How I can avoid this? I have to give some chances to those categories, but minimal (which means is not impossible).

Upvotes: 0

Views: 132

Answers (2)

hindmost
hindmost

Reputation: 7195

You can use a distribution array which will have each category repeated hits times. Then you can simply get a random element from that array.

Somehow like this:

$distr = array();
array_walk($a, function ($hits, $cate) use ($distr) {
  $distr = array_merge($distr, array_fill(0, $hits, $cate));
});

$index = mt_rand(0, count($distr) - 1);
$random_cate = $distr[$index];

Upvotes: 2

Timothy Shields
Timothy Shields

Reputation: 79521

What you have is a random variable X that will take one of the values in S = {A, B, C, D, E, F}.

P(X = A) = 1/2
P(X = B) = 0
etc.

Define a new uniform random variable Y for which P(Y = A) = P(Y = B) = ... = P(Y = F) = 1 / |S| and a random variable Z = X if T = 0 and Z = Y if T = 1, where T is a Bernoulli random variable with P(T = 1) = t and P(T = 0) = 1 - t.

Then for all s in S,

P(Z = s) = P(Z = s | T = 0) P(T = 0) + P(Z = s | T = 1) P(T = 1) = (1 - t) P(X = s) + t / |S|

Using this model, all you need to choose is the parameter t in [0,1], where t indicates the probability of a uniform random selection out of S. t = 0 is your current model, where B, C, and F will never occur.

Upvotes: 1

Related Questions