Ryan Leach
Ryan Leach

Reputation: 4470

Pick a random weighted element, with sample, no replacement

Given a structure representing a reward in a loot table, where a is the reward type, and 2 is an integer weighting that means that a is twice as likely to get pulled out then d.

Map{
  "a" -> 2
  "b" -> 2
  "c" -> 2
  "d" -> 1
  "e" -> 1
  "f" -> 1
}

How can I generate a sample for display purposes + a winner?

My current (pseudo) code:

list out;
foreach(entry:map){
  for(entry.value){
    out.add(a)
  }
}

Then to create a sample for display.

Collections.shuffle(out);
List display = out.stream()
  .distinct()
  .limit(8)
  .collect(Collectors.toList());

With this code, can I trust .distinct to not skew the odds, if I choose the winner by

winner = display.get(0);

I realize that getting the last element added will possibly skew the results, as after the distinct call happens, it will make it more likely to pick a number with a lower weighting.

but picking the first element of the stream should be trust worthy right? since it was chosen before .distinct had it's state inducing effect?

Upvotes: 2

Views: 1095

Answers (3)

nasukkin
nasukkin

Reputation: 2540

I like Martin's answer, but I'll also post my own as a caveat/alternative based on the performance concerns he raised. A very similar implementation to his own can be achieved with a Map (I'll use HashMap since it's my favorite).

private final AtomicLong idxCounter = new AtomicLong(0);
private final Map<Long, Item> dropTable = new HashMap<>();
public void addDrop(Item item, long relativeFrequency) {
    while (relativeFrequency-- > 0) {
        Long nextIdx = idxCounter.getAndIncrement();
        dropTable.put(nextIdx, item);
    }
}

private static final Random rng = new Random(System.currentTimeMillis());
public Item getRandomDrop() {
    Long size = idxCounter.get();
    // randomValue will be something in the interval [0, size), which 
    // should cover the whole dropTable.
    // See http://stackoverflow.com/questions/2546078 for a fair
    // implementation of nextLong.
    Long randomValue = nextLong(rng, size); 
    return dropTable.get(randomValue); 
}

Getting a value by key from a HashMap is very fast. You could optimize it further by specifying the dropTable initial capacity and load factor (see the javadoc for HashMap) but that's up to your own judgement.

It's also thread-safe so long as nothing else is toying with the dropTable!

Upvotes: 1

Martin Milichovsky
Martin Milichovsky

Reputation: 730

Look at Stochastic universal sampling and Fitness proportionate selection. The simple approach for taking one sample according to the weights can be explained by representing each element as an interval on with length proportionate to its weight. E.g:

Map{
  "a" -> 2 // weight 2
  "b" -> 2
  "c" -> 2
  "d" -> 1
  "e" -> 1
  "f" -> 1
}
=>
Map{
  "a" -> (0,2) // weight 2 -- is now length of the interval
  "b" -> (2,4) // ...
  "c" -> (4,6)
  "d" -> (6,7)
  "e" -> (7,8)
  "f" -> (8,9)
}

Then you pick random number from 0 to 9 9*Math.random() (as a pointer to the range) and check which interval it belongs to -- this is your random sample w.r.t the input weights. Repeat until you get the desired number of samples (and ignore duplicates, if you wish)...

Of course this is a bit idiomatic explanation, in the real code you would keep just the upper bound, since the lower is just the upperof previous element. And then you'd pick the first element that has the bound above the random pointer.


Update: Your original approach with repeating the elements is OK from the mathematical point of view (probability of picking elament with double weight is double), but it would be an issue when the weights are high: Map{"a"->1000 "b"->100000}. Also it wouldn't handle real-valued weights well.

Upvotes: 1

Eric
Eric

Reputation: 121

Your data structure implementation seems a little bit odd. I would do something like this:

Map{
  0 -> "a"
  2 -> "b"
  4 -> "c"
  5 -> "d"
  6 -> "e"
  7 -> "f"
}

Then, to make things faster (or to allow for a very large loot table), I'd have a value like int maxValue = 7. Now, to get a loot item from the table, I can just call for a random integer lootDrop between 0 and maxValue (inclusive). Then I can iterate through my table to find the largest value less than or equal to lootdrop. If you needed to keep the map as string to integer mappings, and have control over the integer mappings, it's fairly trivial to do that as well.

If you don't want to go that far, you could simply get a random integer between 0 and 8 in your solution, which will still work.

Is there a reason you're insisting on this formulation?

Upvotes: 0

Related Questions