Mayank Kansal
Mayank Kansal

Reputation: 1

Multithreading taking equal time as single thread quick sorting

I'm working on linux but multithreading and single threading both are taking 340ms. Can someone tell me what's wrong with what I'm doing?

Here is my code

#include<time.h>
#include<fstream>
#define SIZE_OF_ARRAY 1000000
using namespace std;

struct parameter
{
        int *data;
        int left;
        int right;
};
void readData(int *data)
{
        fstream iFile("Data.txt");
        for(int i = 0; i < SIZE_OF_ARRAY; i++)
                iFile>>data[i];
}

int threadCount = 4;

int Partition(int *data, int left, int right)
{
        int i = left, j = right, temp;
        int pivot = data[(left + right) / 2];
        while(i <= j)
        {
                while(data[i] < pivot)
                        i++;
                while(data[j] > pivot)
                        j--;
                if(i <= j)
                {
                        temp = data[i];
                        data[i] = data[j];
                        data[j] = temp;
                        i++;
                        j--;
                }
        }
        return i;
}

void QuickSort(int *data, int left, int right)
{
        int index = Partition(data, left, right);
        if(left < index - 1)
                QuickSort(data, left, index - 1);
        if(index < right)
                QuickSort(data, index + 1, right);
}

//Multi threading code starts from here
void *Sort(void *param)
{
        parameter *param1 = (parameter *)param;
        QuickSort(param1->data, param1->left, param1->right);
        pthread_exit(NULL);
}

int main(int argc, char *argv[])
{
        clock_t start, diff;
        int *data = new int[SIZE_OF_ARRAY];
        pthread_t threadID, threadID1;
        pthread_attr_t attr;
        pthread_attr_init(&attr);
        pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
        pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
        parameter param, param1;
        readData(data);
        start = clock();
        int index = Partition(data, 0, SIZE_OF_ARRAY - 1);
        if(0 < index - 1)
        {
                param.data = data;
                param.left = 0;
                param.right = index - 1;
                pthread_create(&threadID, NULL, Sort, (void *)&param);
        }
        if(index < SIZE_OF_ARRAY - 1)
        {
                param1.data = data;
                param1.left = index + 1;
                param1.right = SIZE_OF_ARRAY;
                pthread_create(&threadID1, NULL, Sort, (void *)&param1);
        }
        pthread_attr_destroy(&attr);
        pthread_join(threadID, NULL);
        pthread_join(threadID1, NULL);
        diff = clock() - start;
        cout<<"Sorting Time = "<<diff * 1000 / CLOCKS_PER_SEC<<"\n";
        delete []data;
        return 0;
}
//Multithreading Ends here

Single thread main function
int main(int argc, char *argv[])
{
        clock_t start, diff;
        int *data = new int[SIZE_OF_ARRAY];
        readData(data);
        start = clock();
        QuickSort(data, 0, SIZE_OF_ARRAY - 1);
        diff = clock() - start;
        cout<<"Sorting Time = "<<diff * 1000 / CLOCKS_PER_SEC<<"\n";
        delete []data;
        return 0;
}
//Single thread code ends here
some of functions single thread and multi thread use same

Upvotes: 0

Views: 1137

Answers (3)

Martin James
Martin James

Reputation: 24857

Build a thread pool with a producer-consumer queue with 24 threads hanging off it. Partition your data into two and issue a mergesort task object to the pool, the mergesort object should issue further pairs of mergesorts to the queue and wait on a signal for them to finish and so on until a mergersort object finds that it has [L1 cache-size data]. The object then qicksorts its data and signals completion to its parent task.

If that doesn't turn out to be blindingly quick on 24 cores, I'll stop posting about threads..

..and it will handle multiple sorts in parallel.

..and the pool can be used for other tasks.

.. and there is no No performance-destroying, deadlock-generating join(), synchronize(), (if you except the P-C queue, which only locks for long enough to push an object ref on), no thread-creation overhead and no dodgy thread-stopping/terminating/destroying code. Like the barbers, there is no waiting - as soon as a thread is finished with a task it can get another.

No thread micro-management, no tuning, (you could create 64 threads now, ready for the next generation of boxes). You could make the thread count tuneable - just add more threads at runtime, or delete some by queueing up poison-pills.

You don't need a reference to the threads at all - just set 'em off, (pass queue as parameter).

Upvotes: 1

n. m. could be an AI
n. m. could be an AI

Reputation: 119877

clock returns total CPU time, not wall time.

If you have 2 CPUs and 2 threads, then after a second of running both thread simultaneously clock will return CPU time of 2 seconds (the sum of CPU times of each thread).

So the result is totally expected. It does not matter how many CPUs you have, the total running time summed over all CPUs will be the same.

Upvotes: 2

Alexis Wilke
Alexis Wilke

Reputation: 20731

Note that you call Partition once from the main thread...

The code works on the same memory block which prevents a CPU from working when the other accesses that same memory block. Unless your data is really large you're likely to have many such hits.

Finally, if your algorithm works at memory speed when you run it with one thread, adding more threads doesn't help. I did such tests a while back with image data, and having multiple thread decreased the total speed because the process was so memory intensive that both threads were fighting to access memory... and the result was worse than not having threads at all.

What makes really fast computers today go really is fast is running one very intensive process per computer, not a large number of threads (or processes) on a single computer.

Upvotes: 1

Related Questions