Reputation: 109
I am trying to merge sort a on a key of a fairly large doubly-linked list in C, which has about 100,000 elements. Here is the structure for the DLL elements:
struct Pore {
int ns; /* voxel number */
int radius; /* effective radius of porosity surrounding a pore */
struct Pore *next;
struct Pore *prev;
};
After searching around for algorithms, the one I found most often used comprises three functions: mergeSort
, merge
, and split
. I am including them here... please excuse the multiple printf
s in the merge
function because I have been trying to debug a segmentation fault that happens upon the 4097592-nd recursive entry into the merge
function.
Recur01
and Recur02
are global variables that I defined to help with the debugging.
void mergeSort(struct Pore **head)
{
Recur01++;
/* Base case: 0 or 1 pore */
if ((*head) == NULL) {
printf("\nEnter mergeSort %ld, list head is NULL ",Recur01);
fflush(stdout);
return;
}
if ((*head)->next == NULL) {
printf("\nEnter mergeSort %ld, list head next is NULL ",Recur01);
fflush(stdout);
return;
}
printf("\nEnter mergeSort %ld",Recur01);
fflush(stdout);
/* Split head into 'a' and 'b' sublists */
struct Pore *a = *head;
struct Pore *b = NULL;
split(*head, &a, &b);
/* Recursively sort the sublists */
mergeSort(&a);
mergeSort(&b);
/* Merge the two sorted halves */
*head = merge(a,b);
printf("\nExit mergeSort %ld",Recur01);
fflush(stdout);
return;
}
void split(struct Pore *head, struct Pore **a, struct Pore **b)
{
int count = 0;
int lngth = 1;
struct Pore *slow = head;
struct Pore *fast = head->next;
struct Pore *temp;
temp = head;
while (temp->next != NULL) {
lngth++;
/*
printf("\n Length = %d",lngth);
fflush(stdout);
*/
if (temp->next) {
temp = temp->next;
}
}
while (fast != NULL) {
printf("\nCount = %d",count);
fflush(stdout);
fast = fast->next;
if (fast != NULL) {
slow = slow->next;
fast = fast->next;
}
count++;
}
printf("\nDone with while loop, final count = %d",count);
fflush(stdout);
*b = slow->next;
slow->next = NULL;
printf("\nExit split");
fflush(stdout);
return;
}
struct Pore *merge(struct Pore *a, struct Pore *b)
{
Recur02++;
if (Recur02 >= 4097591) {
printf("\nEnter merge %ld",Recur02);
fflush(stdout);
}
/** If first linked list is empty, return the second list */
/* Base cases */
if (a == NULL) return b;
if (b == NULL) return a;
if (Recur02 >= 4097591) {
printf("\n Made it 01");
fflush(stdout);
}
/* Pick the larger key */
if (a->radius > b->radius) {
if (Recur02 >= 4097591) {
printf("\n Made it 02 a is bigger, Recur02 = %ld",Recur02);
fflush(stdout);
printf(" a->next->ns = %d",a->next->ns);
fflush(stdout);
printf(" b->ns = %d",b->ns);
fflush(stdout);
}
a->next = merge(a->next,b);
a->next->prev = a;
a->prev = NULL;
if (Recur02 >= 4097591) {
printf("\nExit merge a %ld",Recur02);
fflush(stdout);
}
return a;
} else {
if (Recur02 >= 4097591) {
printf("\n Made it 02 b is bigger, Recur02 = %ld",Recur02);
fflush(stdout);
printf(" b->next->ns = %d",b->next->ns);
fflush(stdout);
printf(" a->ns = %d",a->ns);
fflush(stdout);
}
b->next = merge(a,b->next);
b->next->prev = b;
b->prev = NULL;
if (Recur02 >= 4097591) {
printf("\nExit merge b %ld",Recur02);
fflush(stdout);
}
return b;
}
}
Running the code works, like I said, until I get to the 4097592-nd entry into merge
. I put a printf
right before the function call and another one immediately upon entry into the function. I also printf
the keys of the elements in the function argument, and they seem okay too. I'm not sure what else to try to get to the bottom of this. Below is the last couple dozen lines of the output:
Exit mergeSort 529095
Exit mergeSort 529095
Enter merge 4097591
Made it 01
Made it 02 a is bigger, Recur02 = 4097591 a->next->ns = 156692 b->ns = 20
Enter merge 4097591
Enter merge 4097592
Made it 01
Made it 02 a is bigger, Recur02 = 4097592 a->next->ns = 156693 b->ns = 20
That is the last line that gets flushed from the buffer before the segmentation fault. I have run out of ideas for how to debug this, so will be grateful for any advice.
Upvotes: 0
Views: 474
Reputation: 28826
The segmentation fault is due to using a recursive merge that calls itself for every node merged. It's OK for the main code to be top down, since that will have stack space complexity of O(log2(n)), but the merge function needs to be iterative.
most often used
The original implementation of std::list::sort() is a bottom up merge sort for linked lists that uses a small array (25 to 32) of lists (or pointers or iterators to the first nodes of a list).
https://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists
Probably most implementations of std::list::sort were bottom up until Visual Studio 2015, which switched from using an array of lists to using iterators (to avoid issues like no default allocator and provide exception safety). This came up in a prior thread, and initially I just accepted the change assuming the switch to iterators required the change to top down. The question came up again later, so I looked into this and determined there was no need to switch to top down merge sort. My main regret is not looking into this from the original question. I did update my answer to show a stand-along iterator based bottom up merge sort, as well as a replament for std::list::sort in VS2019 include file.
`std::list<>::sort()` - why the sudden switch to top-down strategy?
In most cases, provided there is enough memory, it's faster to copy the list to an array (or vector), sort the array, and create a new sorted list. If the nodes in a large linked list are randomly scattered, that can translate into a cache miss for nearly every node accessed. By moving the list to array, the sequential access by merge sort on runs in an array are more cache friendly. This is how Java's native sort for linked lists is implemented, although part of that is due to using a common collections.sort() for multiple container types, including a linked list, while the C++ standard library std::list is an independent container type with it's list specific member functions.
Upvotes: 1
Reputation: 109
@VladfromMoscow suggested using a non-recursive sorting algorithm because recursive ones are not good for long lists. Therefore, I tried to adapt for my doubly linked list an iterative version of merge sort here. Works like a charm. At least in this case it seems that the recursion really was too deep for a list this long.
Upvotes: 0