The radix sort algorithm may have different run-time for datas with different order?

Question

I am trying to implement the radix sort algorithm. I am using a linked list to store the elements to be sorted, then I throw each element to its bucket. This bucket is just a pointer, that will link the list of elements that belongs to its bucket.

I am testing with 10.000.000 and 100.000.000 integers numbers in interval [0, 1000000]. This numbers can be in crescent, decrescent and random order.

The run-time for 100.000.000 numbers in crescent and decrescent order is about 20 seconds. But for the same number of elements with random order the run-time is about 110 seconds.

As I understand it, this algorithm has the same complexity for any quality of the data to be sorted.

Anyone know why this is happening ?

This is my code:

    void radix(Number** numbers)
    {
        unsigned int i, k, e = 1;
        Number* bucket[10];
        Number* tail[10];
        Number* index;

        for(k = 0; k < 7; k++, e *= 10)
        {
            for(i = 0; i < 10; i++) bucket[i] = tail[i] = NULL;

            index = *numbers;
            while(index != NULL)
            {
                i = (index->value / e) % 10;

                if(tail[i] == NULL)
                    bucket[i] = index;
                else
                    tail[i]->next = index;

                tail[i] = index;
                index = index->next; 
            }

            for(i = 0; i < 10; i++)
            {
                if(tail[i] != NULL)
                {
                    *numbers = bucket[i];
                    index = tail[i];
                    for(i++; i < 10; i++)
                    {
                        if(tail[i] != NULL)
                        {
                            index->next = bucket[i];
                            index = tail[i];
                        }
                    }
                }
            }

            index->next = NULL;
        }
    }

where Number is:

typedef struct number
{
    unsigned int value;
    struct number* next;
} Number;

user4842163 · Accepted Answer

The answer is probably going to be related to memory access and locality of reference.

Ascending/descending order has a regular pattern to it that is likely to have greater temporal locality, not so much with respect to the buckets but likely more with respect to the way linked list nodes are used for numbers (especially if they aren't contiguous).

For instance, if we take the input:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ...

We cycle through from bucket 0 to bucket 9 and then back to bucket 0. When we come back to bucket 0, we're accessing a number node which was accessed fairly recently (just 9 iterations ago) which is likely to be cached into a faster, smaller form of memory.

If we use a random ordering, who knows when we'll come back to bucket 0? As a result, there's a greater likelihood where we have long spans moving data from DRAM to cache before we come back to the memory used for the number at head of any given bucket. The result can translate to more of such former number nodes being evicted from the cache, and more cache misses when we come back to such a bucket.

Branch mispredictions could also be consuming a bit of time with respect to the irregular ordering. Profiling should help narrow the cause.

One possible thing to try if you are actually memory-bottlenecked is to turn your buckets into, say, unrolled lists to which you deep copy the numbers. That will make it so you no longer access the memory for previously-inserted numbers which could have been inserted many iterations back (a potentially large variable due to the random ordering). With that we start to get back some of the temporal locality (and possibly spatial locality if the numbers were contiguously allocated) we would otherwise lose with this kind of linked list representation. It then becomes about reusing the contiguous memory of the buckets (for which there are only 10) instead of elements within buckets with variable strides in between. We also get spatial locality within the buckets with an unrolled representation.

But if is the same data, just with different order, this can affect a lot ? 20 to 110 seconds is too much for the same data.

Memory efficiency can make differences in the orders of magnitudes. http://lwn.net/Articles/250967/

I'm not an expert in this subject (more of a "profile it and try to optimize based on guidelines" type), but based on past results I've gotten from memory optimizations, I would often put them on par in terms of effect with algorithmic optimizations. Exceptions would be when the difference in complexity is gross (ex: linearithmic vs. quadratic), but even a linearithmic algorithm can very feasibly beat a linear one with very large inputs if the former is significantly more cache-friendly.

The radix sort algorithm may have different run-time for datas with different order?

Answers (1)

Related Questions