Reputation: 133
I'm currently developing stochastic optimization algorithms and have encountered the following issue (which I imagine appears also in other places): It could be called totally unstable partial sort:
Given a container of size n and a comparator, such that entries may be equally valued. Return the best k entries, but if values are equal, it should be (nearly) equally probable to receive any of them.
(output order is irrelevant to me, i.e. equal values completely among the best k need not be shuffled. To even have all equal values shuffled is however a related, interesting question and would suffice!)
A very (!) inefficient way would be to use shuffle_randomly
and then partial_sort
, but one actually only needs to shuffle the block of equally valued entries "at the selection border" (resp. all blocks of equally valued entries, both is much faster). Maybe that Observation is where to start...
I would very much prefer, if someone could provide a solution with STL algorithms (or at least to a large portion), both because they're usually very fast, well encapsulated and OMP-parallelized.
Thanx in advance for any ideas!
Upvotes: 3
Views: 797
Reputation: 1
If you don't mind sorting the whole list, there is a simple answer. Randomize the result in your comparator for equivalent elements.
std::sort(validLocations.begin(), validLocations.end(),
[&](const Point& i_point1, const Point& i_point2)
{
if (i_point1.mX == i_point2.mX)
{
return Rand(1.0f) < 0.5;
}
else
{
return i_point1.mX < i_point2.mX;
}
});
Upvotes: 0
Reputation: 241861
If you really mean that output order is irrelevant, then you want std::nth_element
, rather than std::partial_sort
, since it is generally somewhat faster. Note that std::nth_element
puts the nth element in the right position, so you can do the following, which is 100% standard algorithm invocations (warning: not tested very well; fencepost error possibilities abound):
template<typename RandomIterator, typename Compare>
void best_n(RandomIterator first,
RandomIterator nth,
RandomIterator limit,
Compare cmp) {
using ref = typename std::iterator_traits<RandomIterator>::reference;
std::nth_element(first, nth, limit, cmp);
auto p = std::partition(first, nth, [&](ref a){return cmp(a, *nth);});
auto q = std::partition(nth + 1, limit, [&](ref a){return !cmp(*nth, a);});
std::random_shuffle(p, q); // See note
}
The function takes three iterators, like nth_element
, where nth
is an iterator to the nth element, which means that it is begin() + (n - 1))
.
Edit: Note that this is different from most STL algorithms, in that it is effectively an inclusive range. In particular, it is UB if nth == limit
, since it is required that *nth
be valid. Furthermore, there is no way to request the best 0
elements, just as there is no way to ask for the 0th element with std::nth_element
. You might prefer it with a different interface; do feel free to do so.
Or you might call it like this, after requiring that 0 < k <= n
:
best_n(container.begin(), container.begin()+(k-1), container.end(), cmp);
It first uses nth_element
to put the "best" k
elements in positions 0..k-1
, guaranteeing that the kth element (or one of them, anyway) is at position k-1
. It then repartitions the elements preceding position k-1
so that the equal elements are at the end, and the elements following position k-1
so that the equal elements are at the beginning. Finally, it shuffles the equal elements.
nth_element
is O(n)
; the two partition
operations sum up to O(n)
; and random_shuffle
is O(r)
where r
is the number of equal elements shuffled. I think that all sums up to O(n)
so it's optimally scalable, but it may or may not be the fastest solution.
Note: You should use std::shuffle
instead of std::random_shuffle
, passing a uniform random number generator through to best_n
. But I was too lazy to write all the boilerplate to do that and test it. Sorry.
Upvotes: 1
Reputation: 31
Not fully understanding your issue, but if you it were me solving this issue (if I am reading it correctly) ...
Since it appears you will have to traverse the given object anyway, you might as well build a copy of it for your results, sort it upon insert, and randomize your "equal" items as you insert.
In other words, copy the items from the given container into an STL list but overload the comparison operator to create a B-Tree, and if two items are equal on insert randomly choose to insert it before or after the current item.
This way it's optimally traversed (since it's a tree) and you get the random order of the items that are equal each time the list is built.
It's double the memory, but I was reading this as you didn't want to alter the original list. If you don't care about losing the original, delete each item from the original as you insert into your new list. The worst traversal will be the first time you call your function since the passed in list might be unsorted. But since you are replacing the list with your sorted copy, future runs should be much faster and you can pick a better pivot point for your tree by assigning the root node as the element at length() / 2.
Hope this is helpful, sounds like a neat project. :)
Upvotes: 2
Reputation: 146968
You want to partial_sort
first. Then, while elements are not equal, return them. If you meet a sequence of equal elements which is larger than the remaining k, shuffle and return first k. Else return all and continue.
Upvotes: 3