Reputation: 2813
I am working through timing the insertion-sort algorithm for array-based implementations of varying lengths.
I know insertion-sort is average case O(n^2) so I realize that it will take a bit when you attempt to sort large arrays, but why would there be a nearly 6500ms difference between the bottom two implementations for an array of 100,000 entries?
This is the setup for my array filled with random integers from 1-1 million
int[] oneHundredThousand = new int[100000];
Random r = new Random();//RANDOMIZER
for(int i = 0; i < oneHundredThousand.length; i++)
oneHundredThousand[i] = r.nextInt(1000000) + 1; //1 - 1000000
These are the 2 methods I run for testing, which use insertion-sort
public static long insertionSort1(int[] intArray) {
long startTime = System.currentTimeMillis();
int n = intArray.length;
for (int k = 1; k < n; k++) {
int cur = intArray[k];
int j = k;
while (j > 0 && intArray[j-1] > cur) {
intArray[j] = intArray[j-1];
j--;
}
intArray[j] = cur;
}
long endTime = System.currentTimeMillis();
long elapsedTime = endTime - startTime;
return elapsedTime;
}
And
public static long insertionSort2(int[] input){
long startTime = System.currentTimeMillis();
int temp;
for (int i = 1; i < input.length; i++) {
for(int j = i ; j > 0 ; j--){
if(input[j] < input[j-1]){
temp = input[j];
input[j] = input[j-1];
input[j-1] = temp;
}
}
}
long endTime = System.currentTimeMillis();
long elapsedTime = endTime - startTime;
return elapsedTime;
}
Now calling these methods in main like so (copying the arrays so the original order is preserved through each 'sort'), I get the commented results, why is it so much different? I do not feel like different implementations of the same algorithm should be so much less efficient.
int[] copy100_1 = Arrays.copyOf(oneHundredThousand, oneHundredThousand.length);
int[] copy100_2 = Arrays.copyOf(oneHundredThousand, oneHundredThousand.length);
//816ms
System.out.print(insertionSort1(copy100_1));
//7400ms
System.out.print(insertionSort2(copy100_2));
Upvotes: 0
Views: 158
Reputation: 20185
Analyzing Insertion Sort, one finds that the best case wrt. execution time is O(n²). Let us take your first implementation to see why.
public static long insertionSort1(int[] intArray) {
long startTime = System.currentTimeMillis();
int n = intArray.length;
for (int k = 1; k < n; k++) {
int cur = intArray[k];
int j = k;
while (j > 0 && intArray[j-1] > cur) { // This loop can break early due
// to intArray[j-1] > cur
intArray[j] = intArray[j-1];
j--;
}
intArray[j] = cur;
}
long endTime = System.currentTimeMillis();
long elapsedTime = endTime - startTime;
return elapsedTime;
}
This means that for a (partially) sorted array, the execution time for (this part) reduces to O(n).
Your second implementation is lacking such an early break condition. But you can add it quite simply:
public static void insertionSort2(int[] input) {
int temp;
for (int i = 1; i < input.length; i++) {
for (int j = i; j > 0; j--) {
if (input[j] < input[j - 1]) {
temp = input[j];
input[j] = input[j - 1];
input[j - 1] = temp;
} else {
break; // this is the "early break" missing.
}
}
}
}
I have taken @amanin's answer as a template and re-implemented all three versions.
package benchmark;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.TimeUnit;
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class Test {
@State(Scope.Benchmark)
public static class Input {
public static final Random rng = new Random();
final int[] array;
final int[] expected;
public Input() {
final Random r = new Random();
this.array = new int[200_000];
for (int i = 0; i < this.array.length; i++) {
this.array[i] = i;
}
this.expected = Arrays.copyOf(this.array, this.array.length);
// Fisher-Yates shuffle
for (int i = this.array.length - 1; i > 0; --i) {
int swap = Input.rng.nextInt(i);
int tmp = this.array[swap];
this.array[swap] = this.array[i];
this.array[i] = tmp;
}
}
}
@Benchmark
public void benchSort1(final Input in) {
insertionSort1(in.array);
}
@Benchmark
public void benchSort2(final Input in) {
insertionSort2(in.array);
}
@Benchmark
public void benchSort3(final Input in) {
insertionSort3(in.array);
}
public static void insertionSort1(int[] intArray) {
int n = intArray.length;
for (int k = 1; k < n; k++) {
int cur = intArray[k];
int j = k;
while (j > 0 && intArray[j - 1] > cur) {
intArray[j] = intArray[j - 1];
j--;
}
intArray[j] = cur;
}
}
public static void insertionSort2(int[] input) {
int temp;
for (int i = 1; i < input.length; i++) {
for (int j = i; j > 0; j--) {
if (input[j] < input[j - 1]) {
temp = input[j];
input[j] = input[j - 1];
input[j - 1] = temp;
}
}
}
}
public static void insertionSort3(int[] input) {
int temp;
for (int i = 1; i < input.length; i++) {
for (int j = i; j > 0; j--) {
if (input[j] < input[j - 1]) {
temp = input[j];
input[j] = input[j - 1];
input[j - 1] = temp;
} else {
break;
}
}
}
}
public static void main(String[] arg) throws Exception {
Options option = new OptionsBuilder().include(Test.class.getSimpleName()).build();
new Runner(option).run();
}
}
These were the final results, with benchSort1
and benchSort2
being your original versions and benchSort3
the "corrected" double-for
version:
Benchmark Mode Cnt Score Error Units
Test.benchSort1 avgt 2 0.307 ms/op
Test.benchSort2 avgt 2 15145.611 ms/op
Test.benchSort3 avgt 2 0.210 ms/op
As you can see, both versions are now pretty close together time-wise.
Upvotes: 2
Reputation: 4139
To complete above answers, I'd like to tell that to properly compare performance of both your algorithms, you should consider using a real Benchmark framework like JMH.
So, why is that ? With a simple main, your program is too dependant of your computer state: if on your second sorting operation, your machine starts to swap, or another program consume a lot of processing power, it will degrade your java application performance, and so your measures on the second algorithm. Plus, you have no control over JIT optimizations.
JMH tries to provide much safer measures by repeating your tests multiple times, with startup phase to harness JIT before measuring, etc.
Here is a sample example of a class benchmarking your algorithms :
// Those annotations control benchmark configuration. More info on JMH doc
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
public class SortingBenchmark {
/**
* We create objects in charge of preparing data required by our benchmarks.
* It is created before benchmark starts, so array initialization phase does
* not pollute measures.
*/
@State(Scope.Benchmark)
public static class Input {
final int[] array;
public Input() {
final Random r = new Random();
array = new int[100000];
for (int i = 0; i < array.length; i++) {
array[i] = r.nextInt(1000000) + 1;
}
}
}
/**
* Test first sorting method
* @param in
*/
@Benchmark
public void benchSort1(final Input in) {
insertionSort1(in.array);
}
/**
* Test second sorting method
* @param in
*/
@Benchmark
public void benchSort2(final Input in) {
insertionSort2(in.array);
}
public static long insertionSort1(int[] intArray) {
long startTime = System.currentTimeMillis();
int n = intArray.length;
for (int k = 1; k < n; k++) {
int cur = intArray[k];
int j = k;
while (j > 0 && intArray[j - 1] > cur) {
intArray[j] = intArray[j - 1];
j--;
}
intArray[j] = cur;
}
long endTime = System.currentTimeMillis();
long elapsedTime = endTime - startTime;
return elapsedTime;
}
public static long insertionSort2(int[] input) {
long startTime = System.currentTimeMillis();
int temp;
for (int i = 1; i < input.length; i++) {
for (int j = i; j > 0; j--) {
if (input[j] < input[j - 1]) {
temp = input[j];
input[j] = input[j - 1];
input[j - 1] = temp;
}
}
}
long endTime = System.currentTimeMillis();
long elapsedTime = endTime - startTime;
return elapsedTime;
}
/**
* That's JMH boilerplate to launch the benchmark.
* @param arg
* @throws Exception
*/
public static void main(String[] arg) throws Exception {
Options option = new OptionsBuilder().include(SortingBenchmark.class.getSimpleName()).build();
new Runner(option).run();
}
And here is the result (Just final conclusion of JMH, to avoid polluting the thread):
# Run complete. Total time: 00:01:51
Benchmark Mode Cnt Score Error Units
SortingBenchmark.benchSort1 avgt 5 0,190 ± 0,016 ms/op
SortingBenchmark.benchSort2 avgt 5 1957,455 ± 73,014 ms/op
I hope it will help you for future tests (note: here is a little tutorial which helped me to learn JMH: JMH tuto).
Upvotes: 2