Is this implementation of concurrent quicksort correct?

I have implemented concurrent quicksort in C++ using OpenMP.

#include <omp.h>
#include <iostream>
#include <algorithm>
using namespace std;

void sort(int *a, int low, int high);
int partition(int *a, int low, int high);

class QuickSort {
    private:
        int *arr;
        int len;

    public:
        void init();
        void Sort();
        void Display();
};

int main() {
    cout << "Program implementing Quicksort." << endl;

    QuickSort a;

    a.init();
    a.Sort();
    a.Display();
}

void sort(int *a, int low, int high) {
    if(high < low || low == high)
        return;
    if(high == low+1) {
        if(a[low] > a[high])
            swap(a[low], a[high]);
        return;
    }
    int pivotidx = partition(a, low, high);
    /*for(int i = 0; i < 5; ++i)
        cout << a[i] << " ";
    cout << endl;*/
    cout << "Pivot element has been placed at correct position: " << pivotidx << " by thread " << omp_get_thread_num() << endl;
    #pragma omp parallel sections
    {
        #pragma omp section
        {
            sort(a, low, pivotidx);
        }
        #pragma omp section
        {
            sort(a, pivotidx+1, high);
        }
    }
}

int partition(int *a, int low, int high) {
    int pivot = low;
    int pivotval = a[low];
    int leftpointer = low;
    int rightpointer = high;
    while(leftpointer < rightpointer) {
        while(a[leftpointer] <= a[pivot] && leftpointer <= high)
            ++leftpointer;
        if(leftpointer > high)
            --leftpointer;
        while(a[rightpointer] >= a[pivot] && rightpointer >= low)
            --rightpointer;
        if(rightpointer < low)
            ++rightpointer;
        if(leftpointer < rightpointer)
            swap(a[leftpointer], a[rightpointer]);
    }
    a[low] = a[rightpointer];
    a[rightpointer] = pivotval;
    return rightpointer;
}

void QuickSort::init() {
    cout << "Enter the number of elements in the array: ";
    cin >> len;

    cout << "Enter the elements of the array: ";
    arr = new int[len];
    for(int i = 0; i < len; ++i)
        cin >> arr[i];
}

void QuickSort::Sort() {
    sort(arr, 0, len-1);
}

void QuickSort::Display() {
    cout << "Sorted array is: " << endl;
    for(int i = 0; i < len; ++i)
        cout << arr[i] << " ";
    cout << endl;
}

It's sorting correctly but I am not sure if it's really running on multiple cores. How can I check this? Also, my parallel code is very similar to the one present in the top answer here. There it's mentioned at the end that this can not extract more parallelism than two threads: if it is executed with more threads, the other threads don't have any work to do and will just sit down idle. Why is it so?

Upvotes: 2

Views: 1797

Answers (1)

Zulan
Zulan

Reputation: 22670

There is a subtle error in partition:

    while(a[leftpointer] <= a[pivot] && leftpointer <= high)
    ...
    while(a[rightpointer] >= a[pivot] && rightpointer >= low)

In both cases, you must change the order those checks, otherwise you sometimes access a[leftpointer] while leftpointer > high which may be out of bound. Similarly for the second while condition.

Also do not lie to the reader, leftpointer is not a pointer, but an index! There are other severe style issues, but since this is not CodeReview, I focus on the parallelization.

Using sections for parallelism here is far from ideal. For instance, you must enable nested parallelism to even have more than two threads possibly active at the same time. Instead, you should use OpenMP tasks. Now spawning a task for each and every call to sort is bad, because it will create many tiny tasks and have a bad overhead / work ratio. Instead, create tasks only for large-enough chunks of data, and make sure that no runtime overhead occurs down in the recursion. For that a sophisticated second recursive function is the best option:

void sort_serial(int* a, int low, int high)
{
    if (high < low || low == high)
        return;
    if (high == low + 1)
    {
        if (a[low] > a[high])
            swap(a[low], a[high]);
        return;
    }
    int pivotidx = partition(a, low, high);
    sort_serial(a, low, pivotidx);
    sort_serial(a, pivotidx + 1, high);
}

void sort(int* a, int low, int high)
{
    if (high < low || low == high)
        return;
    if (high == low + 1)
    {
        if (a[low] > a[high])
            swap(a[low], a[high]);
        return;
    }
    int pivotidx = partition(a, low, high);

    // This is an arbitrary threshold.
    if (high - low > 1000000)
    {
#pragma omp task
        {
            sort(a, low, pivotidx);
        }
#pragma omp task
        {
            sort(a, pivotidx + 1, high);
        }
    }
    else
    {
        sort_serial(a, low, pivotidx);
        sort_serial(a, pivotidx + 1, high);
    }
}

To get going with tasks, you must create a parallel region somewhere - and usually narrow it down for a single thread for starting, e.g. like so:

void QuickSort::Sort()
{    
#pragma omp parallel
    {
#pragma omp single
        sort(arr, 0, len - 1);
    }
}

For a large-enough input, and a good choice of a threshold, this will expose sufficient work that can be done in parallel, but not create a huge overhead.

To check how this runs in parallel, usually one uses operating specific monitoring tools, e.g. time on Linux. You can also use a sophisticated performance analysis tool, that can tell you in detail how threads are executing your tasks in parallel.

Upvotes: 1

Related Questions