Quicksort in C with huge amount of data / memory

Question

I have the following code setup:

#define TSIZE 32
#define TNUM 24000000
#define CORES 4

/* Byte-wise swap two items of size SIZE. */
#define SWAP(a, b, size)                    \
  do                                        \
    {                                       \
      size_t __size = (size);               \
      char *__a = (a), *__b = (b);          \
      do                                    \
      {                                     \
        char __tmp = *__a;                  \
        *__a++ = *__b;                      \
        *__b++ = __tmp;                     \
      } while (--__size > 0);               \
  } while (0)

char* TWEETS;

size_t partition(void* arr, size_t left, size_t right, int (*compar)(const void* , const void*))
{
    char* cArr = (char*) arr;

    size_t i;
    size_t pivotIndex = (left+right)/2;
    char* pivotValue = &cArr[(size_t)TSIZE * right];
    size_t index = left;

    SWAP(&cArr[(size_t)TSIZE * pivotIndex], &cArr[(size_t)TSIZE * right], (size_t)TSIZE);

    for(i = left; i < right; i++) {
        if(compar((void*) &cArr[(size_t)TSIZE * i], (void*) pivotValue) < 0) {
            SWAP(&cArr[(size_t)TSIZE * i], &cArr[(size_t)TSIZE * index], (size_t)TSIZE);
            index++;
        }
    }
    SWAP(&cArr[(size_t)TSIZE * index], &cArr[(size_t)TSIZE * right], (size_t)TSIZE);
    return index;
}

void quicksort(void* base, size_t left, size_t right, int (*compar)(const void* , const void*))
{
    if(left < right) {
        size_t pivot = partition(base, left, right, compar);

        #pragma omp task
        quicksort(base, left, pivot-1, compar);

        #pragma omp task
        quicksort(base, pivot+1, right, compar);
    }
}

int main(int argc, char** argv) {
    omp_set_dynamic(0);
    omp_set_num_threads(CORES);
    TWEETS = (char*) malloc((size_t)TNUM * (size_t)TSIZE * (size_t)CORES * (size_t)sizeof(char));
    if(TWEETS == NULL) exit(1);

    readData();

    #pragma omp parallel
    {
        #pragma omp single
        quicksort(TWEETS, 0, ((size_t)CORES*(size_t)TNUM)-(size_t)1, compare);
    }

    free(TWEETS);
}

So first of all, forgive the huge amount of (size_t) casts, I did this in a fit of desperation.

What am I doing here
I am reading in a text file with 24 million lines of text, each line contains 32 bytes of characters. The lines are then sorted according to the compare function, which I have omitted here. I can guarantee that this function works and is not the cause of my trouble. At all times it returns either -1, 0 or 1.
I am also trying to parallelize the quicksort algorithm. The lines of code grow along with the amount of cores I use, e.g. 1 core = 24million, 2 cores = 48 million and so on.

What is working already
Working already is sorting the file with 1 to 8 cores as long as the file size stays below 48 million lines of text.

What is my problem
My problem is that, once I try to sort a file with 72 million lines of text or more the quicksort algorithm runs into a segmentation fault. I have debugged the code with gdb as far as I could, and the code that is at fault is this line:

SWAP(&cArr[(size_t)TSIZE * i], &cArr[(size_t)TSIZE * index], (size_t)TSIZE);

It's the swap call in the partition function in the for loop. I could also see that, at this point, the variable "right" had the value 18446744073709551615 (2^64-1) which is the cause of the segmentation fault. The maximum value "right" should ever have is TSIZE * TNUM * CORES. Since the number is that huge my only guess is that an overflow happens somewhere in the algorithm.

And well, here's the catch: The algorithm and the entire program work flawlessly when staying <= 48 million lines of text. Once I go beyond that the segfault happens. I have also made sure that reading in the data works, meaning after the process of reading in the data about 3gb of my RAM are in use. The segfault definitely occurs during the sorting of the char array.

So why is it working with up to 48.000.000 lines of text, and why is it segfaulting when having more than that? Where is my mistake?

WhozCraig · Accepted Answer

You have an edge case in your algorithm that isn't accounted for.

If the bottom (left edge) partition of the original sequence ever encounters no swaps (i.e. every value is "greater-or-equal" than the pivot), then index , which started out at zero (0), will remain as such. The index i will march to the end. The pivot value that your temporarily storing in the right-most slot is then swapped into place (i.e. cArr[0] and cArr[right] are swapped), and you return 0 from the function. In other words, this:

size_t pivot = partition(base, left, right, compar);

#pragma omp task
quicksort(base, left, pivot-1, compare);
// here ================^

executes withpivot being returned as zero from the prior call, passes pivot-1 as right and results in an underflow. This would give you exactly the value of right you're getting when you fault. (2^64-1 on every platform I've ever used).

You need to account for this (or never let it happen in the first place). Whether this happens in your code is entirely dependent on the content of each partition that is processed with left=0. It may not happen the first time, the second time, etc.. But get the right data swapped into that continually decreasing partition space and eventually it can happen.

Untested, But Worth A Look

I'm not a fan of left and right partitioning markers in C-implementations of quicksort() in the first place. The language supports pointer math, so use that and tout around something you know is concrete (a base and a length). I've not tested the following, and have only once ever had to deal with OMP, but simplified, what I mean is something like this:

void quicksort(void* base, size_t len, int(*compar)(const void*, const void*))
{
    if (len < 2)
        return;

    char* cArr = (char*)base;
    char* pivotValue = cArr + ((size_t)TSIZE * (len - 1));
    SWAP(cArr + ((size_t)TSIZE * (len / 2)), pivotValue, TSIZE);

    size_t i;
    size_t pivot = 0;

    for (i = 0; i < len; ++i)
    {
        if (compar(cArr + ((size_t)TSIZE * i), ) < 0)
        {
            SWAP(cArr + ((size_t)TSIZE * i), cArr + ((size_t)TSIZE * pivot), (size_t)TSIZE);
            ++pivot;
        }
    }
    SWAP(cArr + ((size_t)TSIZE * pivot), pivotValue, (size_t)TSIZE);

#pragma omp task
    quicksort(cArr, pivot++, compar);

#pragma omp task
    quicksort(cArr+((size_t)TSIZE * pivot), len-pivot, compar);
}

I hope it is obvious how this is called.

Quicksort in C with huge amount of data / memory

Answers (2)

Related Questions