Reputation: 3391
I'm looking for a way to implement a diversified sort. Each cell contains a weight value along with an enum type. I would like to sort it in a way that it will make the weight value dynamic according to the types of elements that were already chosen, giving priority to those 'less chosen' so far. I would like to control the diversity factor, so that when setting it with a high value, it'll produce a fully diverse results array, and when giving a low value it will provide an almost 'regular' sorted array.
This doesn't sound like a very specific use case, so if there are any references to known algorithms, that will also be great.
Update: According to Ophir suggestion, this might be a basic wrapper:
// these will be the three arrays, one per type
$contentTypeA, $contentTypeB, $contentTypeC;
// sort each by value
sort($contentTypeA);
sort($contentTypeB);
sort($contentTypeC);
// while i didn't get the amount I want or there aren't any more options to chose from
while ($amountChosen < 100 && (count($contentTypeA) + count($contentTypeB) + count($contentTypeC) > 0)) {
$diversifiedContent[] = selectBest($bestA, $bestB, $bestC, &$contentTypeA, &$contentTypeB, &$contentTypeC);
$amountChosen++;
}
$diversifiedContent = array_slice($diversifiedContent, 0, 520);
return $diversifiedContent;
}
function selectBest($bestA, $bestB, $bestC, &$contentTypeA, &$contentTypeB, &$contentTypeC) {
static $typeSelected;
$diversifyFactor = 0.5;
if (?) {
$typeSelected['A']++;
array_shift($contentTypeA);
return $bestA;
}
else if (?) {
$typeSelected['B']++;
array_shift($contentTypeB);
return $bestA;
}
else if (?) {
$typeSelected['C']++;
array_shift($contentTypeC);
return $bestA;
}
}
Upvotes: 4
Views: 317
Reputation: 6814
Heres an idea:
class item(object):
def __init__(self, enum_type, weight):
self.enum_type = enum_type
self.weight = weight
self.dyn_weight = weight
def __repr__(self):
return unicode((self.enum_type, self.weight, self.dyn_weight))
def sort_diverse(lst, factor):
# first sort
by_type = sorted(lst, key=lambda obj: (obj.enum_type, obj.weight))
cnt = 1
for i in xrange(1, len(lst)):
current = by_type[i]
previous = by_type[i-1]
if current.enum_type == previous.enum_type:
current.dyn_weight += factor * cnt
cnt += 1
else:
cnt = 1
return sorted(by_type, key=lambda obj: (obj.dyn_weight, obj.enum_type))
Try this example:
lst = [item('a', 0) for x in xrange(10)] + [item('b', 1) for x in xrange(10)] + [item('c', 2) for x in xrange(10)]
print sort_diverse(lst, 0) # regular sort
print sort_diverse(lst, 1) # partially diversified
print sort_diverse(lst, 100) # completely diversified
Depending on your needs, you might want to use a more sophisticated weight update function.
This algorithm is basically O(nlogn) time complexity and O(n) space complexity as it requires two sorts and two copies of the list.
Upvotes: 1
Reputation: 624
Your definition is very general terms, not in mathematical terms, so I doubt if you can find a close solution that matches exactly what you want. I can suggest this simple approach:
Sort each type separately. Then merge the lists by iteratively taking the maximum value in the list of highest priority, where priority is the product of the value and a "starvation" factor for that type. The starvation factor will be a combination of how many steps ignored that type, and the diversity factor. The exact shape of this function depends on your application.
Upvotes: 2