Parallel merge sort using OPENMP

I have a written a sequential merge sort program as below:

#include "stdafx.h"
#include "iostream"
#include "omp.h"
#include "fstream"
using namespace std;

int a[50];
void merge(int,int,int); 
void merge_sort(int low,int high)
{
int mid,newval;
double clock, clock1,clock2;
if(low<high)
 {
    mid=(low+high)/2;
    #pragma omp parallel shared(low,mid,high) num_threads(2)
    {
        //newval=omp_get_thread_num();
        //cout<<"thread: "<<newval<<endl;
        merge_sort(low,mid);
        clock=omp_get_wtime();
        //cout<<"Clock: "<<clock<<endl;
        merge_sort(mid+1,high);
        merge(low,mid,high);
        clock1=omp_get_wtime();
        //cout<<"Clock1: "<<clock<<endl;
        clock2=clock1-clock;
        cout<<"Clock2: "<<clock2<<endl;
    }       
    //cout<<"valud=%d"<<low<<endl; 
 }
 }
 void merge(int low,int mid,int high)
 {
 int h,i,j,b[50],k;
 h=low;
 i=low;
 j=mid+1;

 while((h<=mid)&&(j<=high))
 {
    if(a[h]<=a[j])
    {
        b[i]=a[h];
        h++;
    }
    else
    {
        b[i]=a[j];
        j++;
    }
    i++;
 }
 if(h>mid)
 {
    for(k=j;k<=high;k++)
    {
        b[i]=a[k];
        i++;
    }
 }
 else
 {
    for(k=h;k<=mid;k++)
    {
        b[i]=a[k];
        i++;
    }
 }
 for(k=low;k<=high;k++) a[k]=b[k];
    }

    void main()
    {
int num,i;
int clock_n,len;
FILE *fp;
char *buf;
char *newchat;//ifstream properfile;


cout<<"********************************************************************************"<<endl;
cout<<"                             MERGE SORT PROGRAM"<<endl;

cout<<"********************************************************************************"<<endl;
cout<<endl<<endl;
cout<<"Please Enter THE NUMBER OF ELEMENTS you want to sort [THEN PRESS ENTER]:"<<endl;
cout<<endl;
//cout<<"Now, Please Enter the ( "<< num <<" ) numbers (ELEMENTS) [THEN PRESS ENTER]:"<<endl;
//for(i=1;i<=num;i++)
//{
        fp=fopen("E:\\Study\\Semester 2\\Compsci 711- Parallel and distributed computing\\Assignment\\sample_10.txt","rb");
fseek(fp,0,SEEK_END); //go to end
len=ftell(fp); //get position at end (length)
cout<<"Length is %d"<<len<<endl;
//fseek(fp,0,SEEK_SET); //go to beg.
buf=(char *)malloc(len); //malloc buffer
newchat=buf;
fread(newchat,len,1,fp); //read into buffer
fclose(fp);
//cout<<"Read %c"<<newchat<<endl;

////cin>>num;


//}

    merge_sort(1,len);

cout<<endl;
cout<<"So, the sorted list (using MERGE SORT) will be :"<<endl;
cout<<endl<<endl;

for(i=1;i<=num;i++)
cout<<a[i]<<"   ";
cout<<endl<<endl<<endl<<endl;

  }

Now I want to parallelize this code(API used for parallelization in C is OPENMP). Can you help people me out? Basically I use #pragma parallel num_thread(4) but I dont know whether I should include anything else in order for parallelization to take place.

Upvotes: 0

Views: 8666

Answers (2)

Marc Holmes
Marc Holmes

Reputation: 1

If sorting under 128 intergers 32bit (which will fit well in the cpu cache) you should do a Binary Insert sort is generaly best.

If sort larger numbers these following papers cover how to make a parrallel merge sort.

http://www1.chapman.edu/~radenski/research/papers/mergesort-pdpta11.pdf

This paper covers the splits parralel on both OMP and MPI what it doesnt not explain is how to do the merges in parallel

http://www.cc.gatech.edu/~ogreen3/_docs/Merge_Path_-_Parallel_Merging_Made_Simple.pdf

This paper explains how to do the merges in parralel. Despite its name it was pretty confusing to me at first but it boils down to this, when sorting two allready sorted list the rank sort(the normal merge method) either goes down (up array A) or a cross (up array B) the merge matrix in what is called the merge path. if using multiple processors you can split the region up and do a rank sort at any piont by finding the merge path by using the diagonals and binary search.

Upvotes: 0

prathmesh.kallurkar
prathmesh.kallurkar

Reputation: 5686

The main bottleneck of a merge-sort algorithm is the merge function. Its complexity is O(n). The cost of first few merge operations is going to dominate the cost of your complete application. Use an optimized parallel algorithm for larger arrays.

For smaller arrays (<20 elements), avoid the barriers. Actually I would prefer a sequential O(n^2) algorithm.

Shouldn't you use sections instead of #pragma omp parallel shared(low,mid,high) num_threads(2)

Upvotes: 1

Related Questions