tacos_tacos_tacos
tacos_tacos_tacos

Reputation: 10585

Why do these two bubble sort implementations differ significantly in runtime if they have similar numbers of steps and swaps?

I was trying to implement bubble sort in C. I've done so in this gist.

I implemented the algorithm laid out on Wikipedia's article for bubble sort, sort_bubble, and compared it to a reference implementation I found on github, bubble_sort:

typedef struct Bubble_Sort_Stats {
    int num_swaps;
    int num_steps;
} bubble_sort_stats_t;

bubble_sort_stats_t bubble_sort(int arr[], int n) {
    bubble_sort_stats_t stats;
    stats.num_swaps = 0;
    stats.num_steps = 0;

    int temp;
    int i;
    int j;

    while (i < n) {
        j = 0;
        while (j < i) {
            stats.num_steps++;
            if (arr[j] > arr[i]) {
                temp = arr[j];
                arr[j] = arr[i];
                arr[i] = temp;
                stats.num_swaps++;
            }
            j++;
        }
        i++;
    }
    return stats;
}

bubble_sort_stats_t sort_bubble(int array[], int length_of_array) {
    bubble_sort_stats_t stats;
    stats.num_swaps = 0;
    stats.num_steps = 0;

    int n = length_of_array;
    int new_n;
    while (n >= 1) {
        new_n = 0;
        for (int i = 0; i < n - 1; i++) {
            stats.num_steps++;

            if (array[i] > array[i+1]) {
                int l = array[i];
                stats.num_swaps++;
                new_n = i + 1;
                array[i] = array[i + 1];
                array[i + 1] = l;
            }
        }
        n = new_n;
    }

    return stats;
}

#define BIG 10000

int main() {
    int nums1[BIG], nums2[BIG];
    for (int i = 0; i < BIG; i++) {
        int newInt = rand() * BIG;;
        nums1[i] = newInt;
        nums2[i] = newInt;
    }

    long start, end;
    bubble_sort_stats_t stats;

    start = clock();
    stats = bubble_sort(nums2, BIG);
    end = clock();
    printf("It took %ld ticks and %d steps to do %d swaps\n\n", end - start, stats.num_steps, stats.num_swaps);

    start = clock();
    stats = sort_bubble(nums1, BIG);
    end = clock();
    printf("It took %ld ticks and %d steps to do %d swaps\n\n", end - start, stats.num_steps, stats.num_swaps);

    for (int i = 0; i < BIG; i++) {
        if (nums1[i] != nums2[i]) {
            printf("ERROR at position %d - nums1 value: %d, nums2 value: %d", i, nums1[i], nums2[i]);
        }

        if (i > 0) {
            if (nums1[i - 1] > nums1[i]) {
                printf("BAD SORT at position %d - nums1 value: %d", i, nums1[i]);
            }
        }
    }

    return 0;
}

Now when I run this program I get these results:

It took 125846 ticks and 49995000 steps to do 25035650 swaps

It took 212430 ticks and 49966144 steps to do 25035650 swaps

That is, the number of swaps is identical, and sort_bubble actually takes fewer steps, but it takes almost twice as long for this size of array!

My suspicion is that the difference has something to do with the control structure itself, the indices, something like that. But I don't really know enough about how the c compiler works to guess further, and I don't know how I would go about even determining this by debugging.

So I would like to know why but also how I could figure this out empirically.

Upvotes: 1

Views: 68

Answers (1)

Nelfeal
Nelfeal

Reputation: 13269

Your bubble_sort isn't actually a bubble sort: it doesn't only compare adjacent pairs.

It is an insertion sort with the inner loop oddly reversed, which still works as intended. It can be rewritten as follows, without changing the number of steps or swaps.

bubble_sort_stats_t bubble_sort(int arr[], int n) {
    bubble_sort_stats_t stats;
    stats.num_swaps = 0;
    stats.num_steps = 0;

    for (int i = 1; i < n; i++) {
        for (int j = i; j > 0; j--) {
            stats.num_steps++;
            if (arr[j-1] > arr[j]) {
                int temp = arr[j];
                arr[j] = arr[j-1];
                arr[j-1] = temp;

                stats.num_swaps++;
            }
        }
    }
    return stats;
}

To get a proper insertion sort from this, simply move the if condition into the inner loop, as follows.

bubble_sort_stats_t bubble_sort(int arr[], int n) {
    bubble_sort_stats_t stats;
    stats.num_swaps = 0;
    stats.num_steps = 0;

    for (int i = 1; i < n; i++) {
        for (int j = i; j > 0 && arr[j-1] > arr[j]; j--) {
            stats.num_steps++;

            int temp = arr[j];
            arr[j] = arr[j-1];
            arr[j-1] = temp;

            stats.num_swaps++;
        }
    }
    return stats;
}

This way, you can see that the number of steps is actually equal to the number of swaps, and less than the number of steps of the actual bubble sort (sort_bubble).

Upvotes: 3

Related Questions