Reputation: 750
Given a C++ vector (let's say it's of doubles, and let's call it unsorted
), what is the most performant way to create a new vector sorted
which contains a sorted copy of unsorted
?
Consider the following naïve solution:
std::vector<double> sorted = unsorted;
std::sort(sorted.begin(), sorted.end());
This solution has two steps:
unsorted
.However, there is potentially a lot of wasted effort in the initial copy of step 1, particularly for a large vector that is (for example) already mostly sorted.
Were I writing this code by hand, I could combine the first pass of my sorting algorithm with step 1, by having the first pass read the values from the unsorted
vector while writing them, partially sorted as necessary, into sorted
. Depending on the algorithm, subsequent steps might then work just with the data in sorted
.
Is there a way to do such a thing with the C++ standard library, Boost, or a third-party cross-platform library?
One important point would be to ensure that the memory for the sorted
C++ vector isn't needlessly initialized to zeroes before the sorting begins. Many sorting algorithms would require immediate random-write access to the sorted
vector, so using reserve()
and push_back()
won't work for that first pass, yet resize()
would waste time initializing the vector.
Edit: As the answers and comments don't necessarily see why the "naïve solution" is inefficient, consider the case where the unsorted
array is in fact already in sorted order (or just needs a single swap to become sorted). In that case, regardless of the sort algorithm, with the naïve solution every value will need to be read at least twice--once while copying, and once while sorting. But with a copy-while-sorting solution, the number of reads could potentially be halved, and thus the performance approximately doubled. A similar situation arises, regardless of the data in unsorted
, when using sorting algorithms that are more performant than std::sort
(which may be O(n) rather than O(n log n)).
Upvotes: 7
Views: 7510
Reputation: 28826
Assuming the vector of doubles doesn't contain special numbers like NAN or infinity, then the doubles can be treated as 64 bit sign + magnitude integers, which can be converted to be used for a radix sort which is fastest. These "sign + magnitude integers" will need to be converted into 64 bit unsigned integers. These macros can be used to convert back and forth SM stands fro sign + magnitude, ULL for unsigned long long (uint64_t). It's assumed that the doubles are cast to type unsigned long long in order to use these macros:
#define SM2ULL(x) ((x)^(((~(x) >> 63)-1) | 0x8000000000000000ull))
#define ULL2SM(x) ((x)^((( (x) >> 63)-1) | 0x8000000000000000ull))
Note that using these macros will treat negative zero as less than positive zero, but this is normally not an issue.
Since radix sort needs an initial read pass to generate a matrix of counts (which are then converted into the starting or ending indices of logical bucket boundaries), then in this case, the initial read pass would be a copy pass that also generates the matrix of counts. A base 256 sort would use a matrix of size [8][256], and after the copy, 8 radix sort passes would be performed. If the vector is much larger than cache size, then the dominant time factor will be the random access writes during each radix sort pass.
Upvotes: 2
Reputation: 92271
The standard library - on purpose - doesn't have a sort-while-copying function, because the copy is O(n) while std::sort
is O(n log n).
So the sort will totally dominate the cost for any larger values of n. (And if n is small, it doesn't matter anyway).
Upvotes: 11