rlbond
rlbond

Reputation: 67829

Complexity of partial_sort vs nth_element

According to cppreference.com, the complexity of the C++ STL sorting algorithms is:

sort: O(N log(N))

partial_sort: "approximately" O(N log(M)), where M is distance(middle-first)

nth_element: "on average" O(N)

However, this seems to imply that, instead of doing a partial_sort, you could use nth_element and then sort the first range, to give an overall complexity of O(N + M log(M)), which is a bit better than O(N log(M)). Is this actually true? Am I better off avoiding partial_sort?

Upvotes: 24

Views: 5897

Answers (2)

Average Joe
Average Joe

Reputation: 659

std::partial_sort would perform partial sort for the M elements you are interested in. On the other hand std::nth_element would only give you an array, such that nth element is placed such that all elements on the left are smaller and on the right are greater.

Use std::partial_sort for use cases such as, getting top 10 results out of a million in order of rank. Use std::nth_element for finding the median of an array, or to find out who stood 10th in exam results.

If you are just interested in the performance characteristics of both, for smaller values of M, std::partial_sort would perform better than std::nth_element (about 10,000) . For a detailed analysis of this, see: https://www.youtube.com/watch?v=-0tO3Eni2uo

Summary of video

std::nth_element uses modified Quickselect, which provides O(N) complexity regardless of M.

std::partial_sort uses Heapselect, which provides better performance than Quickselect for small M. As a side effect, the end state of Heapselect leaves you with a heap, which means that you get the first half of the Heapsort algorithm "for free".

std::partial_sort is optimized for the case where M is a small constant relative to N. For example, taking the top 10 items from a very large variable-length list. It is not optimized for the other cases.

In a race between std::partial_sort and std::nth_element + std::sort, std::partial_sort jumps out to an early lead (small M) but is overtaken by std::nth_element + std::sort once M is no longer small.

Upvotes: 30

rlbond
rlbond

Reputation: 67829

After extensive testing, it seems that partial_sort is faster for my use-case. I suspected as much -- but this seems to confirm it.

Upvotes: 1

Related Questions