chillNZ
chillNZ

Reputation: 319

Sorting algorithm efficiency comparison

I was playing around with simple sorting algorithms to become more familiar with them, and tried to create insertion sort from the description of the algorithm rather than the pseudocode. I made an algorithm that worked and that I thought fitted the description:

import random
nums = []
for i in range(10000):
    nums.append(random.randrange(0, 1000))

def getSortedIndexForEle(ele, sorted):
    for i in range(0, len(sorted)):
        if ele < sorted[i]:
            return i
    return len(sorted)

def sort(lst):
    sorted = []
    for ele in lst:
        sorted.insert(getSortedIndexForEle(ele, sorted), ele)
    return sorted

print(sort(nums))

However the code did not match the algorithm compisition wise (but still produced the correct result) so I had another attempt:

import random
nums = []
for i in range(10000):
    nums.append(random.randrange(0, 1000))

def sort(lst):
    for i in range(1, len(lst)):
        ele = lst[i]
        j = i - 1
        for k in range(j, -2, -1):
            if ele >= lst[k]:
                break
            lst[k + 1] = lst[k]
        lst[k + 1] = ele


sort(nums)
print(nums)

I believe this one is correct, and I compared it to the pseudocode and it does effectively the same thing.

My question is the first algorithm that I made, which did not fit the algorithm, executes in around half the time of the actual thing on my machine (using a list of length 10000, every element a random number). How can this be possible? Is my second algorithm not correct? I even tried a python example of the algorithm and that also took longer than my first one...

Upvotes: 1

Views: 409

Answers (2)

sircodesalot
sircodesalot

Reputation: 11439

Heres an insertion sort off top of my head (sorry it's in pseudo code, but it should give you a basic idea of how it works).

for (int index = 1; index < array.length; index++) {
    for (int insertion = index; insertion > 0 && array[insertion] < array[insertion - 1]; --insertion) {
        swap(array[insertion], array[insertion - 1];
    } 
}

Think about it this way, we have two moving cursors:

for (int index = 1; index < array.length; index++)

The job of this one is to slide all the way to the end of the collection (notice we start at the second item, not the first!).

Then we have this one:

for (int insertion = index - 1; array[insertion] < array[insertion - 1] && insertion > 0; --insertion)

This one is long so I'll take it apart:

for (int insertion = index; ...; --insertion)

This means that our comparison point starts at wherever index points, and moves backwards after every iteration. (Hmm, Why backwards?)

for (; insertion > 0 && array[insertion] < array[insertion - 1];)

as we slide backwards we need to make two comparisons:

(1) Is the insertion index greater than zero? Obviously if we reach the 'base' of the collection, then we're obviously done. No need to continue and we let index move forward one step.

(2) Is the item pointed to by insertion less than the item before it? That's not right, we need to swap those. In fact, we'll continue swapping pairs of items backwards until everything to the left of index is sorted.

Basically this combination of for statements stays:

(1) Put index at 1, then sort everything to the left of it. There are only two items [0] and [1], so that's easy.

(2) Move index forward, now there are three items. Sort [0], [1], and [2]. Keep moving index forward one by one, and sorting the left-hand-side as we progress.

(3) We 'sort' by ensuring that the new item introduced by moving index forward gets slid backwards into place. That's the point of array[insertion] < array[insertion - 1]. In other words, slide newly introduced items backwards until they sit into place.

(4) Since the left hand side is always sorted by definition, all we need to do after incrementing index is to make sure to move the new item into place.

Increment, then slide the new item into place. Rinse and repeat.

Visually:

[1, 2, 3, 4, 0]

Zero is out of place, so when index = 4, insertion = 4 as well. Then we start comparing backwards.

Is array[4] < array [4 - 1]? Yes, 0 < 4, so we swap:

[1, 2, 3, 0, 4]

Again, is array[3] < array[3 - 1]? Yes, 0 < 3 swap:

[1, 2, 0, 3, 4]

Again and again until we move the 0 to the first position because that that point array[insertion] < array[insertion - 1] will be false.

Basically all you're doing is making sure the left hand side of index has no items such that array[insertion] < array[insertion - 1]. If that's true for 2 elements, then we check for three, and then for four etc.

Interestingly we don't need to check the entire left side each time, because once it's sorted, the only element that can be out of place is the NEW element introduced by incrementing index. That is, if we add a new item, ONLY that item can be out of place, so once we move that one into place, by definition the entire left side is sorted!

Upvotes: 0

Joseph Mark
Joseph Mark

Reputation: 9418

The second one sorts in-place, the first one doesn't.

In the second algorithm, you insert each element in the original array and every element after that has to be shifted to accommodate it. Time complexity is O(n2) but it only requires constant O(1) extra memory.

In the first one, you insert an element in separate array and only the larger elements that have already been sorted have to be shifted. So time complexity is somewhere between O(n) and O(n2) but it requires O(n) extra memory.

Upvotes: 1

Related Questions