RomaValcer
RomaValcer

Reputation: 2956

Merge sort for linked list too slow

My task is to read a file, and sort it, all using a singly linked list. I implemented merge sort, but the testing system says it is too slow on big files. How can I optimise this?

void merge_sort(list *a, list *tmp, int n) {
    int k, i, rcount;
    list *cursor, *l, *r, *end;
    k = n/2;
    if(n == 1) {
        return;
    }
    l = a;
    end = list_get(a, k);
    r = end;
    merge_sort(a, tmp, k);
    merge_sort(r, tmp, n - k);
    rcount = k;
    for(cursor = tmp, i = 0; i < n; cursor = cursor->next, i++) {
        if((l != end) && (((rcount == n) || (strcmp(l->value, r->value) < 0)))) {
            cursor->value = l->value;
            l = l->next;
        } else {
            cursor->value = r->value;
            r = r->next;
            rcount++;
        }
    }
    for(cursor = tmp, i = 0; i < n; cursor = cursor->next, a = a -> next, i++) {
        a->value = cursor->value;
    }
    return;
}

Upvotes: 0

Views: 107

Answers (1)

rcgldr
rcgldr

Reputation: 28826

I'm assuming that the requirement is to sort the list, not to sort the data within the nodes (which could be done by creating an array of pointers to nodes and using merge / quick sort via the array of pointers to rearrange the data within the nodes).

Using a top down merge / quick sort is not a good approach for linked lists, due to all the scanning done to emulate random access iterators to recursively split the list.

A bottom up approach is much faster. You could use 4 pointers to nodes as pointers to 4 lists and implement something like tape sort, but that requires using counters to keep track of the logical ends of runs within each list. Wiki article:

http://en.wikipedia.org/wiki/Merge_sort#Use_with_tape_drives

It's simpler and faster still to use a small (26 to 32) array of pointers. This is the algorithm used in HP / Microsoft standard template library to sort a list. Wiki article:

http://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists

Example code. On my system, Intel 2600K 3.4ghz, it can sort 4 million nodes with pseudo random 32 bit unsigned integers as data in about a second.

/* prototype */
NODE * MergeLists(NODE *pSrc1, NODE *pSrc2);

/* sort list using array of pointers to first nodes of list   */
/* aList[i] = NULL or ptr to list with 2 to the power i nodes */

#define NUMLISTS 32                 /* size of array */
NODE * SortList(NODE *pList)
{
NODE * aList[NUMLISTS];             /* array of lists */
NODE * pNode;
NODE * pNext;
int i;
    if(pList == NULL)               /* check for empty list */
        return NULL;
    for(i = 0; i < NUMLISTS; i++)   /* zero array */
        aList[i] = NULL;
    pNode = pList;                  /* merge nodes into array */
    while(pNode != NULL){
        pNext = pNode->next;
        pNode->next = NULL;
        for(i = 0; (i < NUMLISTS) && (aList[i] != NULL); i++){
            pNode = MergeLists(aList[i], pNode);
            aList[i] = NULL;
        }
        if(i == NUMLISTS)           /* don't go past end of array */
            i--;
        aList[i] = pNode;
        pNode = pNext;
    }
    pNode = NULL;                   /* merge array into one list */
    for(i = 0; i < NUMLISTS; i++)
        pNode = MergeLists(aList[i], pNode);
    return pNode;
}

/* mergelists -  compare uses src2 < src1           */
/* instead of src1 <= src2 to be similar to C++ STL */

NODE * MergeLists(NODE *pSrc1, NODE *pSrc2)
{
NODE *pDst = NULL;                  /* destination head ptr */
NODE **ppDst = &pDst;               /* ptr to head or prev->next */
    if(pSrc1 == NULL)
        return pSrc2;
    if(pSrc2 == NULL)
        return pSrc1;
    while(1){
        if(pSrc2->data < pSrc1->data){  /* if src2 < src1 */
            *ppDst = pSrc2;
            pSrc2 = *(ppDst = &(pSrc2->next));
            if(pSrc2 == NULL){
                *ppDst = pSrc1;
                break;
            }
        } else {                        /* src1 <= src2 */
            *ppDst = pSrc1;
            pSrc1 = *(ppDst = &(pSrc1->next));
            if(pSrc1 == NULL){
                *ppDst = pSrc2;
                break;
            }
        }
    }
    return pDst;
}

Upvotes: 3

Related Questions