ilmoi
ilmoi

Reputation: 2534

Merge sort in python: slicing vs iterating - impact on complexity

I want to check that my understanding of how python handles slices is correct.

Here's my implementation of merge sort:

def merge_sort(L):
    def merge(a, b):
        i, j = 0, 0
        c = []
        while i < len(a) and j < len(b):
            if a[i] < b[j]:
                c.append(a[i])
                i += 1
            elif b[j] < a[i]:
                c.append(b[j])
                j += 1
        if a[i:]:
            c.extend(a[i:])
        if b[j:]:
            c.extend(b[j:])
        return c

    if len(L) <= 1:
        return L
    else:
        mid = len(L) // 2
        left = merge_sort(L[:mid])
        right = merge_sort(L[mid:])
        return merge(left, right)

Am I right in thinking that I could replace this:

if a[i:]:
    c.extend(a[i:])
if b[j:]:
    c.extend(b[j:])

With this:

while i < len(a):
    c.append(a[i])
    i += 1
while j < len(b):
    c.append(b[j])
    j += 1

And have the exact same level of complexity? My understanding of slicing is that its complexity is equivalent to slice length? Is that correct?

Does the fact that I'm calling a slice twice (first in the condition, second time inside of it) make it 2x complexity?

Upvotes: 3

Views: 644

Answers (1)

chqrlie
chqrlie

Reputation: 144770

Your implementation of mergesort has problems:

  • in the merge function's main loop, you do nothing if the values in a[i] and b[j] are equal, or more precisely if you have neither a[i] < b[i] nor a[i] > b[i]. This causes an infinite loop.
  • there is no need to define merge as a local function, actually there is no need to make it a separate function, you could inline the code and save the overhead of a function call.

Here is a modified version:

def merge_sort(L):
    if len(L) <= 1:
        return L
    else:
        mid = len(L) // 2
        a = merge_sort(L[:mid])
        b = merge_sort(L[mid:])
        i, j = 0, 0
        c = []
        while i < len(a) and j < len(b):
            if a[i] <= b[j]:
                c.append(a[i])
                i += 1
            else:
                c.append(b[j])
                j += 1
        if a[i:]:
            c.extend(a[i:])
        else:
            c.extend(b[j:])
        return c

Regarding performance, slicing or iterating has no impact on complexity since both operations have linear time cost.

Regarding performance, here are directions to try:

  • replace the test if a[i:] with if i < len(a). Creating the slice twice is costly.
  • perform the sort in place, avoiding the append operations
  • restructure the main loop to have a single test per iteration

Here is a modified version:

def merge_sort(L):
    if len(L) <= 1:
        return L
    else:
        mid = len(L) // 2
        a = merge_sort(L[:mid])
        b = merge_sort(L[mid:])
        i, j, k = 0, 0, 0
        while True:
            if a[i] <= b[j]:
                L[k] = a[i]
                k += 1
                i += 1
                if (i == len(a)):
                    L[k:] = b[j:]
                    return L
            else:
                L[k] = b[j]
                k += 1
                j += 1
                if (j == len(b)):
                    L[k:] = a[i:]
                    return L

Upvotes: 1

Related Questions