pojdrov
pojdrov

Reputation: 49

Efficiency of sorting Algorithms as it relates to input range

I was wondering if the typical fast sorting algos (i.e. quicksort) maintain their superiority when 'unnatural' inputs are used as opposed to rather more standard inputs.

I.E, if we had an array of N integers in the range of 0 to N^4, would quicksort still be the fastest given the extremely wide range of the integers?

Upvotes: 1

Views: 452

Answers (4)

Kevin
Kevin

Reputation: 561

The other answers are essentially right, in that generally sorting algorithms aren't better or worse based on the range of the inputs. However, there is at least one reason why an algorithm could be better or worse based on input range, and that is how they handle duplicate values.

For example, Quicksort is worse on average when there are more duplicate values (see this question for an explanation of why), and when the range of inputs is greater, the chances of duplicates decreases (assuming they are distributed throughout the full range).

Upvotes: 0

Matt Jordan
Matt Jordan

Reputation: 2181

N^4 isn't very big, an array of 2 billion integers would only require 128 bits for each integer to meet that requirement. Since this would require at least 8GB to store in memory, you will generally be limited to O(N*log(N)) sorting algorithms that can sort in place, like quick-sort, rather than O(N) algorithms that require twice as much memory.

Algorithms that allow O(N) (in the best case, which is not likely here) will typically be limited by memory. The example given, radix sort, becomes O(N log(N)) with large data elements, because the data is effectively variable-length - consider an integer that is 32,768 bytes - on a 64-bit machine, your first bucket might be based on the first 8 bytes, the second bucket on the second 8 bytes, but because of the very large possible range and the non-random distribution within buckets, most buckets will be small, leaving a few very large buckets to be sorted using an O(N log(N)) algorithm. Also, this algorithm requires "buckets" to be allocated to hold elements for each radix, which will double the total memory requirement.

With small lists of elements that require very expensive comparisons, radix sort might be a good option, but the difference between O(N) and O(N log(N)) may not be as important with small lists.

Also, with very expensive comparisons, such as very large strings, some variation of a Schwartzian Transform would probably be helpful, and since each algorithm balances between memory and cpu, the optimal sorting algorithm will then be based on the choice between using more memory or using more cpu.

Extreme cases might favor a different sorting algorithm, such as nearly-sorted lists, but usually the cost of detecting those will be high, and making assumptions that an extreme case is true can cause big problems if there is ever a chance that it won't be.

Having said all of that, all practical implementations should attempt to use std::sort with a corresponding implementation of std::hash<> unless absolutely necessary, since std::sort can choose from more than one algorithm, depending on the input data.

Upvotes: 1

Frank Puffer
Frank Puffer

Reputation: 8215

All of the well-known search algorithms are based on element comparison, i.e they check if an element is less, equal or greater than another element. Therefore they are absolutely independent of the range.

However there are special cases where the relative performance of certain algorithms can differ strongly from the average case. Examples for such cases are:

  • The elements are already sorted except a single element or a small subset.
  • The elements are in reverse order.
  • All elements are equal except one.

That's why for each sort algorithm, an average and a worst-case performance can be determined.

Upvotes: 0

ElKamina
ElKamina

Reputation: 7817

Quicksort doesn't get affected by range of numbers, but the order (i.e. if the numbers are already sorted or sorted in reverse order, and if you pick the first element as the pivot). If you are using random pivot approach, even that problem is solved.

In summary, every algorithm has a worst case complexity and it is usually discussed in the literature about the algorithm.

Upvotes: 1

Related Questions