Reputation: 1207
I've got two implementations of insertion sort. The first is pretty much a transcription of the example in my java textbook (albeit with a while
loop instead of the java for
loop)
def insertion_sort(array):
for i in range(1,len(array)):
j = i
while j > 0 and array[j] < array[j-1]:
array[j],array[j-1] = array[j-1], array[j]
j=j-1
return array
The second seems to be a more pythonic implementation.
def insertion_sort2(array):
for i in range(1,len(array)):
for j in range(i-1,-1,-1):
if array[j+1] < array[j]:
array[j+1],array[j] = array[j],array[j+1]
else:
break
return array
After timing both, it seems the second is much slower (3 to 4 times slower) when the list is already sorted, or very nearly sorted. However, as the number of inversions increase the second implementation seems to gain the upper hand (10-15% faster). I was just wondering if someone could explain what cause of this discrepancy might be. Thanks!
Edit: Here's the function I use to create a random list.
def rand_list(length):
return [random.randint(0,9*length) for _ in range(length)]
When I want a list that's partially sorted I just call list(range(length1)) + rand_list(length2)
.
As for timing how long they take to run I've used both the %timeit
tool and the difference between two datetime.now()
calls. They've both given pretty consistent results. Also I just want to emphasize, that I'm creating a new list every time, not sorting an already sorted list.
Upvotes: 4
Views: 205
Reputation: 226296
I was able to reproduce your timings. For random data or reverse sorted data, insertion_sort2 is a bit faster than insertion_sort on Python3.6. However, as you noted, insertion_sort2 is three times slower for data that is already sorted.
The reason for the first case (slightly faster on random or sorted data) is that range_iterator objects generate decreasing integers faster (using C-code internally) than manually writing j=j-1
and j > 0
which are both pure python steps. Accordingly, once the for-loop is warmed-up, it runs a bit faster than the while-loop equivalent.
The reason for the second case (much slower on sorted data) is that a while-loop is cheaper than the equivalent for-loop when there are zero iterations. This is because the while-loop has no set-up costs. The equivalent for-loop still has to create both a range object and a range_iterator object before it can figure-out there a no iterations. Accordingly, while-loops beat for-loops when empty or nearly empty because they avoid set-up costs.
Upvotes: 1