RhythmInk
RhythmInk

Reputation: 553

merge sort python implementation methodology

I'm learning the well known sorting algorithms and implementing them myself. I have recently done merge sort and the code I have is:

def merge(l, r, direction):
    # print('merging')
    # print(l, r)
    # adding infinity to end of list so we know when we've hit the bottom of one pile
    l.append(inf)
    r.append(inf)
    A = []
    i, j = 0, 0

    while (i < len(l)) and (j < len(r)):
        if l[i] <= r[j]:
            A.append(l[i])
            i += 1
        else:
            A.append(r[j])
            j += 1
    # removing infinity from end of list
    A.pop()

    return(A)


def merge_sort(num_lst, direction='increasing', level=0):
    if len(num_lst) > 1:
        mid = len(num_lst)//2
        l = num_lst[:mid]
        r = num_lst[mid:]
        l = merge_sort(l, level=level + 1)
        r = merge_sort(r, level=level + 1)
        num_lst = merge(l, r, direction)

    return num_lst

What I have seen in other implementations that differs from my own is in the merging of lists. Where I just create a blank list and append elements in numerical order other pass the preexisting into merge and overwrite each element to create a list in numerical order. Something like:

def merge(arr, l, m, r): 
    n1 = m - l + 1
    n2 = r- m 
  
    # create temp arrays 
    L = [0] * (n1) 
    R = [0] * (n2) 
  
    # Copy data to temp arrays L[] and R[] 
    for i in range(0 , n1): 
        L[i] = arr[l + i] 
  
    for j in range(0 , n2): 
        R[j] = arr[m + 1 + j] 
  
    # Merge the temp arrays back into arr[l..r] 
    i = 0     # Initial index of first subarray 
    j = 0     # Initial index of second subarray 
    k = l     # Initial index of merged subarray 
  
    while i < n1 and j < n2 : 
        if L[i] <= R[j]: 
            arr[k] = L[i] 
            i += 1
        else: 
            arr[k] = R[j] 
            j += 1
        k += 1
  
    # Copy the remaining elements of L[], if there 
    # are any 
    while i < n1: 
        arr[k] = L[i] 
        i += 1
        k += 1
  
    # Copy the remaining elements of R[], if there 
    # are any 
    while j < n2: 
        arr[k] = R[j] 
        j += 1
        k += 1

I'm curious about the following:

Problem

Is using a append() on a blank list a bad idea? By my understanding when python creates a list it grabs a certain size chunk of memory and if our list grows beyond that copies the list to a different and larger section of memory (which seems it would be pretty high cost if it happened even once for a large list let alone repeatedly). Is there just a higher cost to using append() compared to accessing a list by index? It would seemed to me append would be able to do things at a fairly low cost.

Upvotes: 2

Views: 120

Answers (2)

a_guest
a_guest

Reputation: 36249

When you instantiate a list Python will allocate the memory necessary for holding the items plus some extra memory for future appends / extends. When you put too much extra stuff in the list it eventually needs to be reallocated which potentially slows down the program (see this part of the source code). The reallocation happens here and the size is calculated as:

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

As the comment states the growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ....

So if the size of the final list is known in advance and allocated from the beginning this will save you some extra reallocations (and thus compute time).

Upvotes: 3

Fabian
Fabian

Reputation: 1150

from time import time 
append_arr = []
index_arr = [None] * 10240*1024

t0 = time()
for x in range(10240*1024):
    append_arr.append(x)
t1 = time()

t2 = time()
for i in range(10240*1024):
    index_arr[i] = i
t3 = time()

print(str(t1-t0))
print(str(t3-t2))

Looks like appending is slower.

Upvotes: 0

Related Questions