Reputation: 18219
Consider two 1-dim arrays, one with items to select from and one containing the probabilities of drawing the item of the other list.
items = ["a", 2, 5, "h", "hello", 3]
weights = [0.1, 0.1, 0.2, 0.2, 0.1, 0.3]
In Julia, how can one randomly select an item in items
using weights
to weight the probability to drawing a given item?
Upvotes: 39
Views: 11381
Reputation: 885
Here's a much simpler approach which only uses Julia's base library:
sample(items, weights) = items[findfirst(cumsum(weights) .> rand())]
Example:
>>> sample(["a", 2, 5, "h", "hello", 3], [0.1, 0.1, 0.2, 0.2, 0.1, 0.3])
"h"
This is less efficient than StatsBase.jl
, but for small vectors it's fine.
Also, if weights
is not a normalized vector, you can do:
sample(items, weights) = items[findfirst(cumsum(weights) .> rand() * sum(weights))]
Upvotes: 7
Reputation: 11654
Use the StatsBase.jl
package, i.e.
Pkg.add("StatsBase") # Only do this once, obviously
using StatsBase
items = ["a", 2, 5, "h", "hello", 3]
weights = [0.1, 0.1, 0.2, 0.2, 0.1, 0.3]
sample(items, Weights(weights))
Or if you want to sample many:
# With replacement
my_samps = sample(items, Weights(weights), 10)
# Without replacement
my_samps = sample(items, Weights(weights), 2, replace=false)
(In Julia < 1.0, Weights
was called WeightVec
).
You can learn more about Weights
and why it exists in the docs. The sampling algorithms in StatsBase
are very efficient and designed to use different approaches depending on the size of the input.
Upvotes: 42