yot
yot

Reputation: 11

Merge Sort issue when removing the array copy step

I've been having an issue that I couldn't debug for quite some time. I am trying to implement a MergeSort algorithm with no additional steps of array copying by following Robert Sedgewick's algorithm in "Algorithm's in C++" book.
Short description of the algorithm:

The recursive program is set up to sort b, leaving results in a. Thus, the recursive calls are written to leave their result in b, and we use the basic merge program to merge those files from b into a. In this way, all the data movement is done during the course of the merges.

The problem is that I cannot find any logical errors but the sorting isn't done properly. Data gets overwritten somewhere and I cannot determine what logical error causes this. The data is sorted when the program is finished but it is not the same data any more.
For example, Input array: { A, Z, W, B, G, C } produces the array: { A, G, W, W, Z, Z }.

I can obviously see that it must be a logical error somewhere, but I have been trying to debug this for a pretty long time and I think a fresh set of eyes could maybe see what I'm missing cause I really can't find anything wrong.

Full code I am running:

//Here is my complete code that I run and that behaves as specified above.

#include <iostream>
#include <stdlib.h>

using namespace std;

// function to print the array
void print(char * a[],int l, int r)
{ for(int i=l;i<=r;i++) cout << (i+1) << ": " << a[i] << endl; }

static const int M = 1;

void insertion(char** a, int l, int r)
{ int i,j;
  char * temp;
  for(i=1;i<r+1;i++)
  { temp = a[i];
    j = i;
    while((j>0) && strcmp(a[j-1],temp)>0)
    { a[j] = a[j-1];
      j = j - 1; }
     a[j] = temp; } }

//merging a and b into c
void merge(char ** c,char ** a, int N, char ** b, int M)
{ for (int i=0, j=0, k=0; k<(N+M); k++)
  { if(i == N) {c[k] = b[j++]; continue; }
if(j == M) {c[k] = a[i++]; continue; }
c[k] = strcmp(a[i], b[j])<0 ? a[i++] : b[j++]; } }

 void mergesortAux(char ** a, char ** b, int l, int r)
 { if(r-l <= M) { insertion(a, l, r); return; }
   int m = (l+r)/2;
   mergesortAux(b, a, l, m);        //merge sort left
   mergesortAux(b, a, m+1, r);      //merge sort right
   merge(a+l,b+l,m-l+1,b+m+1,r-m);  }       //merge

void mergesort(char ** a,int l, int r, int size)
{ static char ** aux = (char**)malloc(size*sizeof(char*));
  for(int i = l; i<size; i++) aux[i] = a[i];
  mergesortAux(a,aux,l,r); }
 //free(aux); } I removed this piece of code as suggested, I realize it's unnecessary 

int main(int argc, char * argv[]) 
{ int size = 6;
  char **data = (char**)malloc(size*sizeof(char*));
  data[0] = "A";
  data[1] = "Z";
  data[2] = "W";
  data[3] = "B";
  data[4] = "G";
  data[5] = "C";

  print(data,0,size-1);
  printf("---------------------------\n");

  mergesort(data,0,size-1,size);

  printf("---------------------------\n");
  print(data,0,size-1);
  return 0;
}

Output:

1: A
2: Z
3: W
4: B
5: G
6: C
---------------------------
---------------------------
1: A
2: G
3: W
4: W
5: Z
6: Z

Upvotes: 0

Views: 514

Answers (1)

BartoszKP
BartoszKP

Reputation: 35891

Your whole code is completely messed up.

You're confusing the input array with the output array, with an attempt to sort in place all the time. Your code formatting is terrible. Your variable names are terrible. You're using redundant arguments (you're passing size along with the index range l and r). You confuse the meaning of the index ranges. There are so many things wrong with this code, that I could just describe here whole night.

However, only the last one is the critical here:

In this line:

{ if(r-l <= M) { insertion(a, l, r); return; }

You attempt to sort the l to r part of the array a. However, the l index is not used in the insertion function so it modifies a from 0 to r inclusive. r suggests that it's the right index, but is used in this method as the array's size.

The simplest fix is to change it to:

if(r-l <= M) { insertion(a+l, l, r-l); return; }

But fixing the insertion method to handle the indexes properly is probably a better idea.

Also, I strongly recommend you clean up the whole code. This is unmaintainable. This is unreadable. This is the darkest side of C. I won't ever understand why people can't use understandable function/variable names, use structs to represent composite concepts and partition the logic into small readable parts. As if writing in C forced you to think that you're setting the individual bits in processor's registers and couldn't use a variable name that is more than 1 character long.

Even you - the author of this code have problems understanding it. This is a strong suggestion that there are some serious problems with it :)

Sorry for the frustration, I couldn't resist the puzzle and wasted a lot of time on this :)

Upvotes: 2

Related Questions