stevenchusm
stevenchusm

Reputation: 21

Return list of weighted objects with semi-randomized ranking

Let's say I have a list of objects (in Python) that looks something like this (contains an identifier and a ranking/weighting):

objects = [
    ("object_1", 0.50),
    ("object_2", 0.75),
    ("object_3", 0.25),
    ("object_4", 0.01),
    ("object_5", 0.99),
]

I would like to return this same objects array but in semi-randomised order of their weighting. That is, I don't always want to return:

[
    ("object_5", 0.99),
    ("object_2", 0.75),
    ("object_1", 0.50),
    ("object_3", 0.25),
    ("object_4", 0.01),
]

but would instead rather allow for some non-determinism so that, generally speaking, the returned array looks like the above but could also look like:

[
    ("object_5", 0.99),
    ("object_1", 0.50),
    ("object_2", 0.75),
    ("object_4", 0.01),
    ("object_3", 0.25),
]

EDIT: I think I'm asking a different question than this one because the ordering matters here; and in the other question the order doesn't matter (again, I think!).

Upvotes: 1

Views: 142

Answers (2)

stevenchusm
stevenchusm

Reputation: 21

If you are able to ensure that the weight values are always in between [0, 1), then the following code will work!

from random import random


def weighted_sample_without_replacement(
    population: List[Tuple[Any, float]], weights: tuple
) -> List[Tuple[Any, float]]:
    return sorted(population, key=lambda x: x[1] * random())

where population looks like:

[
    ("object_5", 0.99),
    ("object_2", 0.75),
    ("object_1", 0.50),
    ("object_3", 0.25),
    ("object_4", 0.01),
]

weights like:

(
    0.99,
    0.75,
    0.50,
    0.25,
    0.01,
)

Upvotes: 0

Dani Mesejo
Dani Mesejo

Reputation: 61920

If I'm not mistaken one approach could be to weighted sample without replacement:

from random import choices


def weighted_sample_without_replacement(population, weights, k=1):
    #    https://stackoverflow.com/a/43649323/4001592
    weights = list(weights)
    positions = range(len(population))
    indices = []
    while True:
        needed = k - len(indices)
        if not needed:
            break
        for i in choices(positions, weights, k=needed):
            if weights[i]:
                weights[i] = 0.0
                indices.append(i)
    return [population[i] for i in indices]


data = [
    ("object_5", 0.99),
    ("object_2", 0.75),
    ("object_1", 0.50),
    ("object_3", 0.25),
    ("object_4", 0.01),
]

_, weights = zip(*data)
sample = weighted_sample_without_replacement(data, weights, k=len(data))
print(sample)

Output (of a single run)

[('object_2', 0.75), ('object_5', 0.99), ('object_3', 0.25), ('object_1', 0.5), ('object_4', 0.01)]

A basic experimental analysis seems to validate my hypothesis:

from collections import defaultdict
from operator import itemgetter

_, weights = zip(*data)
counts = defaultdict(lambda : defaultdict(int))
for _ in range(1000):
    sample = weighted_sample_without_replacement(data, weights, k=len(data))
    for i, (key, _) in enumerate(sample):
        counts[i][key] += 1

for key, values in counts.items():
    print(key, sorted(values.items(), key=itemgetter(1), reverse=True))

Output (experiment)

0 [('object_5', 415), ('object_2', 290), ('object_1', 186), ('object_3', 106), ('object_4', 3)]
1 [('object_2', 322), ('object_5', 309), ('object_1', 241), ('object_3', 119), ('object_4', 9)]
2 [('object_1', 319), ('object_2', 259), ('object_3', 209), ('object_5', 199), ('object_4', 14)]
3 [('object_3', 533), ('object_1', 239), ('object_2', 126), ('object_5', 75), ('object_4', 27)]
4 [('object_4', 947), ('object_3', 33), ('object_1', 15), ('object_2', 3), ('object_5', 2)]

The value 'object_5' is in the two first positions 724 times out of 1000, while 'object_4' is in the last position 947 times out of 1000. For a better visualisation of the results, see the plot below (the visualisation was generated by an additional run of the experimental setup):

Distribution of positions by object

The code for reproducing the experiments can be found here.

Upvotes: 2

Related Questions