Reins
Reins

Reputation: 1109

Non-standard sorting algorithm for random unique integers

I have an array of at least 2000 random unique integers, each in range 0 < n < 65000.

I have to sort it and then get the index of a random value in the array. Each of these operations have to be as fast as possible. For searching the binary-search seems to serve well.

For sorting I used the standard quick sorting algorithm (qsort), but I was told that with the given information the standard sorting algorithms will not be the most efficient. So the question is simple - what would be the most efficient way to sort the array, with the given information? Totally puzzled by this.

Upvotes: 2

Views: 1404

Answers (5)

Adam Stelmaszczyk
Adam Stelmaszczyk

Reputation: 19837

Quicksort's complexity is O(n*log(n)), where n = 2000 in your case. log(2000) = 10.965784.

You can sort in O(n) using one of these algorithms:

I've compared std::sort() to counting sort for N = 100000000:

#include <iostream>
#include <vector>
#include <algorithm>
#include <time.h>
#include <string.h>
using namespace std;

void countSort(int t[], int o[], int c[], int n, int k) 
{
    // Count the number of each number in t[] and place that value into c[].
    for (int i = 0; i < n; i++)
        c[t[i]]++;
    // Place the number of elements less than each value at i into c[].  
    for (int i = 1; i <= k; i++)
        c[i] += c[i - 1];
    // Place each element of t[] into its correct sorted position in the output o[].
    for (int i = n - 1; i >= 0; i--) 
    {
        o[c[t[i]] - 1] = t[i];
        --c[t[i]];
    }
}

void init(int t[], int n, int max)
{
    for (int i = 0; i < n; i++)
        t[i] = rand() % max;
}

double getSeconds(clock_t start)
{
    return (double) (clock() - start) / CLOCKS_PER_SEC;
}

void print(int t[], int n)
{
    for (int i = 0; i < n; i++)
        cout << t[i] << " ";
    cout << endl;
}

int main()
{
    const int N = 100000000;
    const int MAX = 65000;
    int *t = new int[N];

    init(t, N, MAX);
    //print(t, N);
    clock_t start = clock();
    sort(t, t + N);  
    cout << "std::sort " << getSeconds(start) << endl;
    //print(t, N);

    init(t, N, MAX);
    //print(t, N);
    // o[] holds the sorted output.
    int *o = new int[N];
    // c[] holds counters.
    int *c = new int[MAX + 1];
    // Set counters to zero.
    memset(c, 0, (MAX + 1) * sizeof(*c));
    start = clock();
    countSort(t, o, c, N, MAX);
    cout << "countSort " << getSeconds(start) << endl;
    //print(o, N);

    delete[] t;
    delete[] o;
    delete[] c;
    return 0;
}

Results (in seconds):

std::sort 28.6
countSort 10.97

For N = 2000 both algorithms give 0 time.

Upvotes: 0

Steve Jessop
Steve Jessop

Reputation: 279285

I don't know why the person who told you that would be so peversely cryptic, but indeed qsort is not the most efficient way to sort integers (or generally anything) in C++. Use std::sort instead.

It's possible that you can improve on your implementation's std::sort for the stated special case (2000 distinct random integers in the range 0-65k), but you're unlikely to do a lot better and it almost certainly won't be worth the effort. The things I can think of that might help:

  • use a quicksort, but with a different pivot selection or a different threshold for switching to insertion sort from what your implementation of sort uses. This is basically tinkering.

  • use a parallel sort of some kind. 2000 elements is so small that I suspect the time to create additional threads will immediately kill any hope of a performance improvement. But if you're doing a lot of sorts then you can average the cost of creating the threads across all of them, and only worry about the overhead of thread synchronization rather than thread creation.

That said, if you generate and sort the array, then look up just one value in it, and then generate a new array, you would be wasting effort by sorting the whole array each time. You can just run across the array counting the number of values smaller than your target value: this count is the index it would have. Use std::count_if or a short loop.

Each of these operations have to be as fast as possible.

That is not a legitimate software engineering criterion. Almost anything can be made a minuscule bit faster with enough months or years of engineering effort -- nothing complex is ever "as fast as possible", and even if it was you wouldn't be able to prove that it cannot be faster, and even if you could there would be new hardware out there somewhere or soon to be invented for which the fastest solution is different and better. Unless you intend to spend your whole life on this task and ultimately fail, get a more realistic goal ;-)

Upvotes: 7

pentadecagon
pentadecagon

Reputation: 4847

For sorting uniformly distributed random integers Radix Sort is typically the fastest algorithm, it can be faster than quicksort by a factor of 2 or more. However, it may be hard to find an optimized implementation of that, quick sort is much more ubiquitous. Both Radix Sort and Quick Sort may have very bad worst case performance, like O(N^2), so if worst case performance is important you have to look elsewhere, maybe you pick introsort, which is similar to std::sort in C++.

For array look up a hash table is by far the fasted method. If you don't want yet another data structure, you can always pick binary search. If you have uniformly distributed numbers interpolation search is probably the most effective method (best average performance).

Upvotes: 1

vhallac
vhallac

Reputation: 13917

Since the domain of your numbers is so small, you can create an array of 65000 entries, set the index of the numbers you see to one, and then collect all numbers that are set to one as your sorted array. This will be exactly 67000 (assuming initialization of array is without cost) iterations in total.

Since the lists contain 2000 entries, O(n*log(n)) will probably be faster. I can think of no other O(n) algorithm for this, so I suppose you are better off with a general purpose algorithm.

Upvotes: 0

Michael
Michael

Reputation: 5897

Standard sorting algorithms, as well as standard nearly anything, are very good general purpose solution. If you know nothing about your data, if it truly consists of "random unique integers", then you might as well go with one of the standard implementations.

On the other hand, most programming problems appear in a context that tells something about data, and the additional info usually leads to more efficient problem-specific solutions.

For example, does your data appear all at once or in chunks? If it comes piecemeal you may speed things up by interleaving incremental sorting, such as dual-pivot quicksort, with data acquisition.

Upvotes: 0

Related Questions