Rhona
Rhona

Reputation: 45

Ruby - Pick one element from array by possibility

I have an array with 3 elements and I want to pick one and add that into another array base on possibility.

For example, num 1 has 5% chance to be picked, num 2 has 60% chance to be picked and num 3 has 35% chance to be picked.

arr = [{:num=>1, :diff=>-29}, {:num=>2, :diff=>5}, {:num=>3, :diff=>25}]

I found below methods from stackoverflow, just wondering if this would work? Or there is another way to do it?

def get_num(arr)
    case rand(100) + 1
    when 1..5
        p arr[0]
    when 6..65
        p arr[1]
    when 66..100
        p arr[2]
    end
end

get_num(arr)


Thanks!

Upvotes: 2

Views: 516

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110675

Your code is fine but here are two other approaches.

Use a cumulative distribution function ("CDF")

CDF = [[0.05,0], [0.05+0.60,1], [0.5+0.60+0.35,2]]
  #=> [[0.05,0], [0.65,1], [1.0,2]]
def get_num(arr)
  n = rand
  arr[CDF.find { |mx,_idx| n <= mx }.last]
end
arr = [{:num=>1, :diff=>-29}, {:num=>2, :diff=>5}, {:num=>3, :diff=>25}]
get_num(arr)
  #=> {:num=>2, :diff=>5}
get_num(arr)
  #=> {:num=>2, :diff=>5}
get_num(arr)
  #=> {:num=>3, :diff=>25}
get_num(arr)
  #=> {:num=>1, :diff=>-29}
get_num(arr)
  #=> {:num=>2, :diff=>5}

Suppose:

n = rand
  #=> 0.5385005480168696

then

a = CDF.find { |mx,_idx| n <= mx }
  #=> [0.65,1]
i = a.last
  #=> 1
arr[i]
  #=> {:num=>2, :diff=>5}

Note that I've followed the convention of beginning the name of find's second block variable (_idx) with an underscore to signal to the reader that that block variable is not used in the block calculation. Often just an underscore (_) is used.

Now consider the fraction of times each element of arr will be randomly-drawn if n draws are made:

def outcome_fractions(arr, n)
  n.times
   .with_object(Hash.new(0)) { |_,h| h[get_num(arr)] += 1 }
   .transform_values { |v| v.fdiv(n) }
end

Randomly select from an array of indices

outcome_fractions(arr, 1_000)
  #=> {{:num=>2, :diff=>5}  =>0.612,
  #    {:num=>3, :diff=>25} =>0.328,
  #    {:num=>1, :diff=>-29}=>0.06}
outcome_fractions(arr, 100_000)
  #=> {{:num=>3, :diff=>25} =>0.34818,
  #    {:num=>1, :diff=>-29}=>0.04958,
  #    {:num=>2, :diff=>5}  =>0.60224}

Notice that the fraction of each hash that is randomly drawn approaches its specified population probability as the sample size is increased (though the "pseudo-random" draws are not truly random).

Do not be concerned with how outcome_fractions works.


Here is another way that is more efficient (because it does not use find, which performs a linear search) but uses more memory.

CHOICE = [*[0]*5, *[1]*60, *[2]*35]
  #=> [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  #    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  #    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
  #    1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
  #    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
  #    2, 2, 2, 2, 2]
def get_num(arr)
  arr[CHOICE[rand(100)]]
end
  #=> {{:num=>2, :diff=>5} =>0.60029,
  #    {:num=>3, :diff=>25}=>0.35022,
  #    {:num=>1, :diff=>-29}=>0.04949}

Note that:

[*[0]*5, *[1]*60, *[2]*35]

produces the same array as

[[0]*5, [1]*60, [2]*35].flatten

The first * in *[0]*5 is the splat operator; the second is the method Array#*. [0]*5 #=> [0,0,0,0,0] is evaluated first.

CHOICE has 100 elements. If the three probabilities were, say, 0.048, 0.604 and 0.348, CHOICE would have 10**3 #=> 1_000 elements (48 zeros, 604 ones and 348 twos).

Upvotes: 6

Stefan
Stefan

Reputation: 114138

Here's a small variation / addition to Cary's great answer.

Instead of calculating the cumulative sums yourself, you can let Ruby build it for you out of the initial probabilities:

probs = [5, 60, 35]

sum = 0
sums = probs.map { |x| sum += x }
#=> [5, 65, 100]

we can now calculate a random number between 0 and the total sum and find the corresponding index:

r = rand(sum)                  #=> 37
sums.find_index { |i| r < i }  #=> 1

Note that the initial probabilities don't have to sum to 100. instead of [5, 60, 35] you could also use:

probs = [1, 12, 7]

You can wrap the above code into a method:

def random_index(*probs)
  sum = 0
  sums = probs.map { |x| sum += x }
  r = rand(sum)
  sums.find_index { |i| r < i }
end

random_index(5, 60, 35) #=> 1
random_index(5, 60, 35) #=> 1
random_index(5, 60, 35) #=> 2

You could also make the method return a proc / lambda that can be reused:

def random_index_proc(*probs)
  sum = 0
  sums = probs.map { |x| sum += x }
  -> {
    r = rand(sum)
    sums.find_index { |i| r < i }
  }
end

prc = random_index_proc(5, 60, 35)

prc.call #=> 1
prc.call #=> 1
prc.call #=> 0

Last not least, you can also pre-populate an array this way: (using Cary's naming convention)

CHOICE = [5, 60, 35].flat_map.with_index { |v, i| [i] * v }

and get a random element via:

def get_num(arr)
  arr[CHOICE.sample]
end

To keep the array small, you should prefer [1, 12, 7] (20 elements) over [5, 60, 35] (100 elements). With a little help from gcd you don't even have to calculate it yourself:

probs = [5, 60, 35]

gcd = probs.reduce { |a, b| a.gcd(b) }
#=> 5

probs.map { |i| i / gcd }
#=> [1, 12, 7]

Upvotes: 4

Related Questions