Bubble Sort outperforming Selection Sort by a large margin

I am currently trying to understand sorting algorithms, and have been looking at pseudocode and converting it to python (using python 3.6, IDE is Spyder 3.1.2). I've written a simple Bubble Sort:

def BubbleSort(array_to_sort):
    n = len(array_to_sort) - 1
    swapped = True
    while (swapped):
        swapped = False
        for i in range(n):
             if array_to_sort[i] > array_to_sort[i+1]:
                array_to_sort[i+1], array_to_sort[i] = array_to_sort[i], array_to_sort[i+1] 
                swapped = True
    return array_to_sort

And a simple Selection Sort:

def SelectionSort(array_to_sort):
    n = len(array_to_sort)
    for i in range(n):
        minPos = i
        for j in range(i+1, n):
            if array_to_sort[j] < array_to_sort[minPos]:
                minPos=j

        if minPos != i:
            array_to_sort[i], array_to_sort[minPos] = array_to_sort[minPos], array_to_sort[i] 

    return array_to_sort

And am trying to time them like so:

def main():
    array = RandomNumberArray()
    a = timeit.Timer(functools.partial(BubbleSort, array)) 
    print(a.timeit(1000))
    b = timeit.Timer(functools.partial(SelectionSort, array)) 
    print(b.timeit(1000))

RandomNumberArray just generates a list of numbers that I'd like to sort:

def RandomNumberArray():
    return random.sample(range(0, 1000), 1000)

As they have both have the same time complexity O(n²), I was expecting them to take roughly the same amount of time however I've found that my Selection Sort is performing comparatively worse to my Bubble Sort e.g. for an array with 1000 numbers over 1000 iterations -
BS results: 0.390 s
SS results: 63.618 s

And an array of 1000 numbers over 10000 iterations -
BS results: 2.074 s
SS results: 645.944 s

Is there an issue with the implementation of these algorithms or is it expected that there's such a large discrepancy? I'm aware that there are other faster sorts and in practice no one would really use a BS or SS but I'm just trying to understand why the SS appears to be so much slower than BS?

Upvotes: 1

Answers (3)

Jim Mischel

Reputation: 134035

User @richard pointed out the primary problem.

In general, I would expect a well-written bubble sort to outperform selection sort every time. Your bubble sort is not particularly well-written, because is does not take advantage of the partial ordering that the algorithm provides after each iteration of the outer loop. That is, you have:

def BubbleSort(array_to_sort):
    n = len(array_to_sort) - 1
    swapped = True
    while (swapped):
        swapped = False
        for i in range(n):
             if array_to_sort[i] > array_to_sort[i+1]:
                array_to_sort[i+1], array_to_sort[i] = array_to_sort[i], array_to_sort[i+1] 
                swapped = True
    return array_to_sort

Every time through the outer loop, one more item is being pushed to the end of the array, in order. That is, the first time through the loop, the maximum item is pushed to the last position. Next time, the 2nd largest is pushed to the next-to-last position, etc. But your code doesn't take advantage of that. Instead, it continually compares items that are already sorted. You can improve your code dramatically with:

def BubbleSort(array_to_sort):
    n = len(array_to_sort) - 1
    swapped = True
    j = 0
    while (swapped):
        swapped = False
        for i in range(0, n-j):
             if array_to_sort[i] > array_to_sort[i+1]:
                array_to_sort[i+1], array_to_sort[i] = array_to_sort[i], array_to_sort[i+1] 
                swapped = True
        ++j;
    return array_to_sort

Bubble sort is generally faster than selection sort for two reasons: 1) there's an "early out" that lets you quit if no swaps were made. 2) locality of reference is much better; you're always comparing and swapping adjacent items.

Upvotes: 1

Tim Peters

Reputation: 70685

A more valuable optimization of bubble sort is to exploit that the last swap made by an iteration leaves the tail end of the array, beyond that point, in its final state. For example, in the first pass over:

a = [2, 1, 3, 4, 5, 6, 7, 8, 9, 10]

only a[0] and a[1] are swapped, and that implies there's no need to do more. The first pass already established that a[0] < a[1] <= a[2] <= ... <= a[9].

This is no harder to code than a swapped flag, but is much more effective. Note that this code does not return the list. You may have noticed that methods of Python objects that mutate the object (e.g., list.sort()) generally return None. This is deliberate, to remind you that you've mutated the object. Otherwise, as in the original testing code you posted, it's too easy to mistakenly assume the function left the original object alone.

def bubbly(a):
    n = len(a)
    while n:
        lastswap = 0
        for i in range(1, n):
            if a[i-1] > a[i]:
                a[i-1], a[i] = a[i], a[i-1]
                lastswap = i
        n = lastswap

Upvotes: 0

Richard

Reputation: 61309

It's not a fair comparison because array gets sorted by the first call and is then passed to the second call. There are several sort algorithms for which already sorted inputs are a worst case.

Bubble sort also has O(n) best case time complexity while selection has O(n^2) best case.

Upvotes: 6

Bubble Sort outperforming Selection Sort by a large margin

Answers (3)

Related Questions