Reputation: 101
lately im comparing different types of sort algorithms in python. I noticed that my quicksort isnt handling well inputs where values are repeated.
def compare_asc(a, b):
return a <= b
def partition(a, p, r, compare):
pivot = a[r]
i = p-1
for j in range(p, r):
if compare(a[j], pivot):
i += 1
a[i], a[j] = a[j], a[i]
a[i+1], a[r] = a[r], a[i+1]
return i + 1
def part_quick_sort(a, p, r, compare):
if p < r:
q = partition(a, p, r, compare)
part_quick_sort(a, p, q-1, compare)
part_quick_sort(a, q+1, r, compare)
def quick_sort(a, compare):
part_quick_sort(a, 0, len(a)-1, compare)
return a
Then I test this
import numpy as np
from timeit import default_timer as timer
import sys
test_list1 = np.random.randint(-10000, 10000, size=10000).tolist()
start = timer()
test_list1 = quick_sort(test_list1, compare_asc)
elapsed = timer() - start
print(elapsed)
test_list2 = np.random.randint(0, 2, size=10000).tolist()
start = timer()
test_list2 = quick_sort(test_list2, compare_asc)
elapsed = timer() - start
print(elapsed)
In this example i get RecursionError: maximum recursion depth exceeded in comparison
, so i added sys.setrecursionlimit(1000000)
and after that i get this output:
0.030029324000224733
5.489867554000284
Can anyone explain why it throws this recursion depth error only during sorting 2nd list ? And why there it is such big time difference ?
Upvotes: 4
Views: 117
Reputation: 70592
Here's a hint: pass a list where all the elements are the same, and watch what it does line by line. It will take time quadratic in the number of elements, and recurse to a level approximately equal to the number of elements.
The usual quicksort partition implementations proceed from both ends, so that in the all-equal case the list slice is approximately cut in half. You can get decent performance in this case for your "only look left-to-right" approach, but the clearest way to do so is to partition into three regions: "less than", "equal", and "greater than".
That can be done in a single left-to-right pass, and is usually called the "Dutch national flag problem". As the text on the linked page says,
The solution to this problem is of interest for designing sorting algorithms; in particular, variants of the quicksort algorithm that must be robust to repeated elements need a three-way partitioning function ...
For concreteness, here's a complete implementation doing one-pass "left to right" single-pivot 3-way partitioning. It also incorporates other well-known changes needed to make a quicksort robust for production use. Note:
O(N*log(N))
time, and (as below, for one way) make worst-case O(N**2)
time unlikely.The code:
from random import randrange
def partition(a, lo, hi, pivot):
i = L = lo
R = hi
# invariants:
# a[lo:L] < pivot
# a[L:i] == pivot
# a[i:R] unknown
# a[R:hi] > pivot
while i < R:
elt = a[i]
if elt < pivot:
a[L], a[i] = elt, a[L]
L += 1
i += 1
elif elt > pivot:
R -= 1
a[R], a[i] = elt, a[R]
else:
i += 1
return L, R
def qsort(a, lo=0, hi=None):
if hi is None:
hi = len(a)
while True: # sort a[lo:hi] in place
if hi - lo <= 1:
return
# select pivot ar random; else it's easy to construct
# inputs that systematically require quadratic time
L, R = partition(a, lo, hi, a[randrange(lo, hi)])
# must recur on only the shorter chunk to guarantee
# worst-case recursion depth is logarithmic in hi-lo
if L - lo <= hi - R:
qsort(a, lo, L)
# loop to do qsort(a, R, hi)
lo = R
else:
qsort(a, R, hi)
# loop to do qsort(a, lo, L)
hi = L
Upvotes: 3