dmars
dmars

Reputation: 33

Mergesort return no longer in order

I'm not a beginning python coder, but I'm not an advanced one either. The problem will be easier to explain if you can look at what I'm describing. Here's the code I'm seeing the problem in:

import operator

def mergeSort(L, compare = operator.lt):
    if len(L) < 2:
        return L[:]
    else:
        middle = int(len(L)/2)
        left = mergeSort(L[:middle], compare)
        right = mergeSort(L[middle:], compare)
        return merge(left, right, compare)

def merge(left, right, compare):
    result = []
    i,j = 0, 0
    while i < len(left) and j < len(right):
        if compare(left[i], right[j]):
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    while (i < len(left)):
        result.append(left[i])
        i += 1
    while (j < len(right)):
        result.append(right[j])
        j += 1
    return result

The merge function works. The final return result is always in order. The problem is that if you take one step forward from the final return result, returning to the return merge(left, right, compare) in mergesort, the returned list L is no longer in order, and then this badly ordered list is returned to the function that called mergesort. I can't even see why this is happening. What could happen on the return trip from return result to return merge(left, right, compare)? The returned list isn't a randomized list, it looks like a partially sorted list, but it was completely sorted at the final return result, and this result was not the one returned to mergesort.

I've used this mergesort before with no problems, so the issue could be in the data I'm sorting. L is a list of lists, and each list element is a list of strings, all the same length, to be exported to csv and eventually to Excel. My data is a table with website titles in the first column and urls in the second column, and I'm sorting them to identify the duplicates. The complete table has several other fields in it, but I see this problem with only the title and url columns, which was what I simplified down to so I could try to see what was going wrong. I wasn't sure if mergesort could handle this data structure, but the final return result in merge certainly indicates that it can. But something mysterious happens in the final return.

Upvotes: 0

Views: 102

Answers (2)

dmars
dmars

Reputation: 33

I ended up writing the python.org list about this problem, and the first reply I got hit the nail on the head:

On Thu, Dec 22, 2016 at 11:55 AM, Deborah Swanson wrote:

The problem is that while mergeSort puts the list ls in perfect order, which I can see by looking at result on merge's final return to mergeSort, and at the left and the right once back in mergeSort. Both the left half and the right half are in order. But the list L is still in its original order, and after mergeSort completes, ls is still in its original order. Maybe there's some bonehead error causing this, but I just can't see it.

Your analysis is excellent. Here's what happens: When you merge-sort, you're always returning a new list (either "return L[:]" or "result = []"), but then you call it like this:

# sort: Description only, to make hyperelinks & find duplicates
mergeSort(ls)

This calls mergeSort, then drops the newly-sorted list on the floor. Instead, try: "ls = mergeSort(ls)".

Thank you for making it so easy for us!

ChrisA

So, it is possible to use merge-sort to sort a list of lists, something I wasn't sure of until now.

Upvotes: 0

Raymond Hettinger
Raymond Hettinger

Reputation: 226624

It looks like the result of the merge wasn't being saved. Here's a fix:

def mergeSort(L, compare = operator.lt):
    if len(L) < 2:
        return L[:]
    else:
        middle = int(len(L)/2)
        left = mergeSort(L[:middle], compare)
        right = mergeSort(L[middle:], compare)
        L[:] = merge(left, right, compare)
    return L

Upvotes: 1

Related Questions