Reputation: 23
I am trying to compare the performance difference in merge sort using a single threaded and multi-threaded program. The time taken to sort an array of size ~ 50000 using a single thread took 0.01x seconds, whereas for an array of same size, using 2/4/8 threads took 0.02-0.03 seconds. I know, the difference is not much, but I'm just curious to know what could be the reason for the slowdown in multi-threaded program? Below is the code for single-threaded program (main function's code):
srand(clock()); //to seed-random numbers
readData(A,n);
clock_t start=clock();
mergeSort(A,0,n-1);
clock_t end=clock();
And, for the multi-threaded program :
int n=50000; //n is the size
int no_of_threads=4;
limit S; //structure containing array,start and end index
srand(clock()); //to seed-random numbers
generateData(&S,n);
pthread_t id[no_of_threads];
int i=0,size=0,k=n/no_of_threads;
clock_t start=clock();
for(i=0; i<no_of_threads; i++)
{
S.start=size,S.end=size+k-1;
pthread_create(&id[i],NULL, sorter ,&S);
size=size + k;
}
for(i=0; i<no_of_threads; i++)
pthread_join(id[i],NULL);
mergeSort(S.A,0,n-1);
clock_t end=clock();
Sorter function:
void* sorter(void *s)
{
limit *S=(limit*)s;
int start=S->start,end=S->end;
mergeSort(S->A,start,end);
}
Upvotes: 1
Views: 609
Reputation: 28826
Looks like you're using a common structure for S, it is possible that S is getting updated in parallel with the thread creation? Perhaps make S an array of no_of_threads structs, then use S[i] for each create thread.
#define no_of_threads 4
limit S[no_of_threads];
// ...
for(i=0; i<no_of_threads; i++)
{
S[i].start=size,S[i].end=size+k-1;
pthread_create(&id[i], NULL, sorter, &S[i]);
size=size + k;
}
// ... after the joins, do a k-way merge (not a merge sort).
I did this a while back using a bottom up merge sort, and your example with top down merge sort uses the same idea. For k threads, split the array into k parts (in my simple example I assume array size is a multiple of k), then merge sort the k parts in parallel, so far this is the same as your code (except for the common S structure). My version then merges pairs of runs of size k using k/2 threads, each doing a 2 way merge, then merges pairs of runs of size 2k using k/4 threads, again each doing a 2 way merge, ... . Before I tested this, I expected little gain because I though the tight loop (compare two elements, move the smaller) in the merge part would be memory bandwidth limited, but it turns out that loop is cpu limited. On an Intel 3770k 3.5ghz with 4 cores, for k = 4, the merge sort was 3 times as fast as single threaded merge sort, and for k = 8, the merge sort was about 3.9 times as fast. Most of the speed up is due to the local L1 and L2 caches in each core. Link to my prior thread about this, and although it's a Windows example with separate main thread functions, consider it as a proof of concept that multi-threading merge sort is faster than single threaded merge sort.
https://codereview.stackexchange.com/questions/148025/multithreaded-bottom-up-merge-sort
Upvotes: 0
Reputation: 8517
Instead of dividing the work, you are doing extra work. In each thread, when the number of threads is x
, you are sorting 1/x
of the array.
After all the threads have completed, you are again calling merge sort on the whole array, which will recursively partition the array right till the bottom and merge, ignoring the fact that sub parts are already sorted.
One way you could use to overcome this is, instead of calling the mergeSort()
function again, you simply merge the sorted subparts, which can be done in O(nx)
time.
Upvotes: 1