Arryyyy
Arryyyy

Reputation: 101

QuickSort works slow for low range data

lately im comparing different types of sort algorithms in python. I noticed that my quicksort isnt handling well inputs where values are repeated.

def compare_asc(a, b):
    return a <= b


def partition(a, p, r, compare):
    pivot = a[r]
    i = p-1
    for j in range(p, r):
        if compare(a[j], pivot):
            i += 1
            a[i], a[j] = a[j], a[i]
    a[i+1], a[r] = a[r], a[i+1]
    return i + 1


def part_quick_sort(a, p, r, compare):
    if p < r:
        q = partition(a, p, r, compare)
        part_quick_sort(a, p, q-1, compare)
        part_quick_sort(a, q+1, r, compare)


def quick_sort(a, compare):
    part_quick_sort(a, 0, len(a)-1, compare)
    return a

Then I test this

import numpy as np
from timeit import default_timer as timer
import sys

test_list1 = np.random.randint(-10000, 10000, size=10000).tolist()
start = timer()
test_list1 = quick_sort(test_list1, compare_asc)
elapsed = timer() - start
print(elapsed)

test_list2 = np.random.randint(0, 2, size=10000).tolist()
start = timer()
test_list2 = quick_sort(test_list2, compare_asc)
elapsed = timer() - start
print(elapsed)

In this example i get RecursionError: maximum recursion depth exceeded in comparison, so i added sys.setrecursionlimit(1000000) and after that i get this output:

0.030029324000224733
5.489867554000284

Can anyone explain why it throws this recursion depth error only during sorting 2nd list ? And why there it is such big time difference ?

Upvotes: 4

Views: 117

Answers (1)

Tim Peters
Tim Peters

Reputation: 70592

Here's a hint: pass a list where all the elements are the same, and watch what it does line by line. It will take time quadratic in the number of elements, and recurse to a level approximately equal to the number of elements.

The usual quicksort partition implementations proceed from both ends, so that in the all-equal case the list slice is approximately cut in half. You can get decent performance in this case for your "only look left-to-right" approach, but the clearest way to do so is to partition into three regions: "less than", "equal", and "greater than".

That can be done in a single left-to-right pass, and is usually called the "Dutch national flag problem". As the text on the linked page says,

The solution to this problem is of interest for designing sorting algorithms; in particular, variants of the quicksort algorithm that must be robust to repeated elements need a three-way partitioning function ...

CODE

For concreteness, here's a complete implementation doing one-pass "left to right" single-pivot 3-way partitioning. It also incorporates other well-known changes needed to make a quicksort robust for production use. Note:

  • You cannot create a pure quicksort that avoids worst-case quadratic time. The best you can do is average-case O(N*log(N)) time, and (as below, for one way) make worst-case O(N**2) time unlikely.
  • You can (as below) guarantee worst-case logarithmic recursion depth.
  • In this approach, a list of all-equal elements is not a bad case, but a very good case: the partitioning routine is called just once total.

The code:

from random import randrange

def partition(a, lo, hi, pivot):
    i = L = lo
    R = hi
    # invariants:
    # a[lo:L]  < pivot
    # a[L:i]  == pivot
    # a[i:R]     unknown
    # a[R:hi]  > pivot
    while i < R:
        elt = a[i]
        if elt < pivot:
            a[L], a[i] = elt, a[L]
            L += 1
            i += 1
        elif elt > pivot:
            R -= 1
            a[R], a[i] = elt, a[R]
        else:
            i += 1
    return L, R

def qsort(a, lo=0, hi=None):
    if hi is None:
        hi = len(a)

    while True:   # sort a[lo:hi] in place
        if hi - lo <= 1:
            return

        # select pivot ar random; else it's easy to construct
        # inputs that systematically require quadratic time
        L, R = partition(a, lo, hi, a[randrange(lo, hi)])

        # must recur on only the shorter chunk to guarantee
        # worst-case recursion depth is logarithmic in hi-lo
        if L - lo <= hi - R:
            qsort(a, lo, L)
            # loop to do qsort(a, R, hi)
            lo = R
        else:
            qsort(a, R, hi)
            # loop to do qsort(a, lo, L)
            hi = L

Upvotes: 3

Related Questions