ViaTech
ViaTech

Reputation: 2813

Why does insertion-sort run so much differently time-wise with different implementation

I am working through timing the insertion-sort algorithm for array-based implementations of varying lengths.

I know insertion-sort is average case O(n^2) so I realize that it will take a bit when you attempt to sort large arrays, but why would there be a nearly 6500ms difference between the bottom two implementations for an array of 100,000 entries?

This is the setup for my array filled with random integers from 1-1 million

int[] oneHundredThousand = new int[100000];
Random r = new Random();//RANDOMIZER

for(int i = 0; i < oneHundredThousand.length; i++)
        oneHundredThousand[i] = r.nextInt(1000000) + 1; //1 - 1000000

These are the 2 methods I run for testing, which use insertion-sort

public static long insertionSort1(int[] intArray) {
    long startTime = System.currentTimeMillis();

    int n = intArray.length;
    for (int k = 1; k < n; k++) {         
        int cur = intArray[k];                
        int j = k;                          
        while (j > 0 && intArray[j-1] > cur) {  
            intArray[j] = intArray[j-1];              
            j--;                              
        }
        intArray[j] = cur;                      
    }

    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

And

public static long insertionSort2(int[] input){
    long startTime = System.currentTimeMillis();
    int temp;
    for (int i = 1; i < input.length; i++) {
        for(int j = i ; j > 0 ; j--){
            if(input[j] < input[j-1]){
                temp = input[j];
                input[j] = input[j-1];
                input[j-1] = temp;
            }
        }
    }
    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

Now calling these methods in main like so (copying the arrays so the original order is preserved through each 'sort'), I get the commented results, why is it so much different? I do not feel like different implementations of the same algorithm should be so much less efficient.

int[] copy100_1 = Arrays.copyOf(oneHundredThousand, oneHundredThousand.length);
int[] copy100_2 = Arrays.copyOf(oneHundredThousand, oneHundredThousand.length);

//816ms 
System.out.print(insertionSort1(copy100_1));
//7400ms
System.out.print(insertionSort2(copy100_2));

Upvotes: 0

Views: 158

Answers (2)

Turing85
Turing85

Reputation: 20185

Analyzing Insertion Sort, one finds that the best case wrt. execution time is O(n²). Let us take your first implementation to see why.

public static long insertionSort1(int[] intArray) {
    long startTime = System.currentTimeMillis();

    int n = intArray.length;
    for (int k = 1; k < n; k++) {         
        int cur = intArray[k];                
        int j = k;

        while (j > 0 && intArray[j-1] > cur) { // This loop can break early due 
                                               // to intArray[j-1] > cur
            intArray[j] = intArray[j-1];              
            j--;                              
        }
        intArray[j] = cur;                      
    }

    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

This means that for a (partially) sorted array, the execution time for (this part) reduces to O(n).

Your second implementation is lacking such an early break condition. But you can add it quite simply:

public static void insertionSort2(int[] input) {
    int temp;
    for (int i = 1; i < input.length; i++) {
        for (int j = i; j > 0; j--) {
            if (input[j] < input[j - 1]) {
                temp = input[j];
                input[j] = input[j - 1];
                input[j - 1] = temp;
            } else {
                break; // this is the "early break" missing.
            }
        }
    }
}

I have taken @amanin's answer as a template and re-implemented all three versions.

package benchmark;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.TimeUnit;

@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Test {

    @State(Scope.Benchmark)
    public static class Input {

        public static final Random rng = new Random();
        final int[] array;
        final int[] expected;

        public Input() {
            final Random r = new Random();
            this.array = new int[200_000];
            for (int i = 0; i < this.array.length; i++) {
                this.array[i] = i;
            }
            this.expected = Arrays.copyOf(this.array, this.array.length);

            // Fisher-Yates shuffle
            for (int i = this.array.length - 1; i > 0; --i) {
                int swap = Input.rng.nextInt(i);
                int tmp = this.array[swap];
                this.array[swap] = this.array[i];
                this.array[i] = tmp;
            }
        }
    }

    @Benchmark
    public void benchSort1(final Input in) {
        insertionSort1(in.array);
    }

    @Benchmark
    public void benchSort2(final Input in) {
        insertionSort2(in.array);
    }

    @Benchmark
    public void benchSort3(final Input in) {
        insertionSort3(in.array);
    }

    public static void insertionSort1(int[] intArray) {
        int n = intArray.length;
        for (int k = 1; k < n; k++) {
            int cur = intArray[k];
            int j = k;
            while (j > 0 && intArray[j - 1] > cur) {
                intArray[j] = intArray[j - 1];
                j--;
            }
            intArray[j] = cur;
        }
    }

    public static void insertionSort2(int[] input) {
        int temp;
        for (int i = 1; i < input.length; i++) {
            for (int j = i; j > 0; j--) {
                if (input[j] < input[j - 1]) {
                    temp = input[j];
                    input[j] = input[j - 1];
                    input[j - 1] = temp;
                }
            }
        }
    }

    public static void insertionSort3(int[] input) {
        int temp;
        for (int i = 1; i < input.length; i++) {
            for (int j = i; j > 0; j--) {
                if (input[j] < input[j - 1]) {
                    temp = input[j];
                    input[j] = input[j - 1];
                    input[j - 1] = temp;
                } else {
                    break;
                }
            }
        }
    }

    public static void main(String[] arg) throws Exception {
        Options option = new OptionsBuilder().include(Test.class.getSimpleName()).build();
        new Runner(option).run();
    }
}

These were the final results, with benchSort1 and benchSort2 being your original versions and benchSort3 the "corrected" double-for version:

Benchmark        Mode  Cnt      Score   Error  Units
Test.benchSort1  avgt    2      0.307          ms/op
Test.benchSort2  avgt    2  15145.611          ms/op
Test.benchSort3  avgt    2      0.210          ms/op

As you can see, both versions are now pretty close together time-wise.

Upvotes: 2

amanin
amanin

Reputation: 4139

To complete above answers, I'd like to tell that to properly compare performance of both your algorithms, you should consider using a real Benchmark framework like JMH.

So, why is that ? With a simple main, your program is too dependant of your computer state: if on your second sorting operation, your machine starts to swap, or another program consume a lot of processing power, it will degrade your java application performance, and so your measures on the second algorithm. Plus, you have no control over JIT optimizations.

JMH tries to provide much safer measures by repeating your tests multiple times, with startup phase to harness JIT before measuring, etc.

Here is a sample example of a class benchmarking your algorithms :

// Those annotations control benchmark configuration. More info on JMH doc
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class SortingBenchmark {

/**
 * We create objects in charge of preparing data required by our benchmarks.
 * It is created before benchmark starts, so array initialization phase does
 * not pollute measures.
 */
@State(Scope.Benchmark)
public static class Input {

    final int[] array;

    public Input() {
        final Random r = new Random();
        array = new int[100000];
        for (int i = 0; i < array.length; i++) {
            array[i] = r.nextInt(1000000) + 1;
        }
    }
}

/**
 * Test first sorting method
 * @param in
 */
@Benchmark
public void benchSort1(final Input in) {
    insertionSort1(in.array);
}

/**
 * Test second sorting method
 * @param in
 */
@Benchmark
public void benchSort2(final Input in) {
    insertionSort2(in.array);
}

public static long insertionSort1(int[] intArray) {
    long startTime = System.currentTimeMillis();

    int n = intArray.length;
    for (int k = 1; k < n; k++) {
        int cur = intArray[k];
        int j = k;
        while (j > 0 && intArray[j - 1] > cur) {
            intArray[j] = intArray[j - 1];
            j--;
        }
        intArray[j] = cur;
    }

    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

public static long insertionSort2(int[] input) {
    long startTime = System.currentTimeMillis();
    int temp;
    for (int i = 1; i < input.length; i++) {
        for (int j = i; j > 0; j--) {
            if (input[j] < input[j - 1]) {
                temp = input[j];
                input[j] = input[j - 1];
                input[j - 1] = temp;
            }
        }
    }
    long endTime = System.currentTimeMillis();
    long elapsedTime = endTime - startTime;
    return elapsedTime;
}

/**
 * That's JMH boilerplate to launch the benchmark.
 * @param arg
 * @throws Exception
 */
public static void main(String[] arg) throws Exception {
    Options option = new OptionsBuilder().include(SortingBenchmark.class.getSimpleName()).build();
    new Runner(option).run();
}

And here is the result (Just final conclusion of JMH, to avoid polluting the thread):

# Run complete. Total time: 00:01:51

Benchmark                    Mode  Cnt     Score    Error  Units
SortingBenchmark.benchSort1  avgt    5     0,190 ±  0,016  ms/op
SortingBenchmark.benchSort2  avgt    5  1957,455 ± 73,014  ms/op

I hope it will help you for future tests (note: here is a little tutorial which helped me to learn JMH: JMH tuto).

Upvotes: 2

Related Questions