Reputation: 21
Let's say I have a list of objects (in Python) that looks something like this (contains an identifier and a ranking/weighting):
objects = [
("object_1", 0.50),
("object_2", 0.75),
("object_3", 0.25),
("object_4", 0.01),
("object_5", 0.99),
]
I would like to return this same objects
array but in semi-randomised order of their weighting. That is, I don't always want to return:
[
("object_5", 0.99),
("object_2", 0.75),
("object_1", 0.50),
("object_3", 0.25),
("object_4", 0.01),
]
but would instead rather allow for some non-determinism so that, generally speaking, the returned array looks like the above but could also look like:
[
("object_5", 0.99),
("object_1", 0.50),
("object_2", 0.75),
("object_4", 0.01),
("object_3", 0.25),
]
EDIT: I think I'm asking a different question than this one because the ordering matters here; and in the other question the order doesn't matter (again, I think!).
Upvotes: 1
Views: 142
Reputation: 21
If you are able to ensure that the weight
values are always in between [0, 1), then the following code will work!
from random import random
def weighted_sample_without_replacement(
population: List[Tuple[Any, float]], weights: tuple
) -> List[Tuple[Any, float]]:
return sorted(population, key=lambda x: x[1] * random())
where population
looks like:
[
("object_5", 0.99),
("object_2", 0.75),
("object_1", 0.50),
("object_3", 0.25),
("object_4", 0.01),
]
weights
like:
(
0.99,
0.75,
0.50,
0.25,
0.01,
)
Upvotes: 0
Reputation: 61920
If I'm not mistaken one approach could be to weighted sample without replacement:
from random import choices
def weighted_sample_without_replacement(population, weights, k=1):
# https://stackoverflow.com/a/43649323/4001592
weights = list(weights)
positions = range(len(population))
indices = []
while True:
needed = k - len(indices)
if not needed:
break
for i in choices(positions, weights, k=needed):
if weights[i]:
weights[i] = 0.0
indices.append(i)
return [population[i] for i in indices]
data = [
("object_5", 0.99),
("object_2", 0.75),
("object_1", 0.50),
("object_3", 0.25),
("object_4", 0.01),
]
_, weights = zip(*data)
sample = weighted_sample_without_replacement(data, weights, k=len(data))
print(sample)
Output (of a single run)
[('object_2', 0.75), ('object_5', 0.99), ('object_3', 0.25), ('object_1', 0.5), ('object_4', 0.01)]
A basic experimental analysis seems to validate my hypothesis:
from collections import defaultdict
from operator import itemgetter
_, weights = zip(*data)
counts = defaultdict(lambda : defaultdict(int))
for _ in range(1000):
sample = weighted_sample_without_replacement(data, weights, k=len(data))
for i, (key, _) in enumerate(sample):
counts[i][key] += 1
for key, values in counts.items():
print(key, sorted(values.items(), key=itemgetter(1), reverse=True))
Output (experiment)
0 [('object_5', 415), ('object_2', 290), ('object_1', 186), ('object_3', 106), ('object_4', 3)]
1 [('object_2', 322), ('object_5', 309), ('object_1', 241), ('object_3', 119), ('object_4', 9)]
2 [('object_1', 319), ('object_2', 259), ('object_3', 209), ('object_5', 199), ('object_4', 14)]
3 [('object_3', 533), ('object_1', 239), ('object_2', 126), ('object_5', 75), ('object_4', 27)]
4 [('object_4', 947), ('object_3', 33), ('object_1', 15), ('object_2', 3), ('object_5', 2)]
The value 'object_5'
is in the two first positions 724 times out of 1000, while 'object_4'
is in the last position 947 times out of 1000. For a better visualisation of the results, see the plot below (the visualisation was generated by an additional run of the experimental setup):
The code for reproducing the experiments can be found here.
Upvotes: 2