Reputation: 41
I am trying to get runtimes for two sort algorithms in Java, insertion and merge sort. The program runs both sorts on an unsorted ArrayList of 433 words multiple times and stores the elapsed times taken for 100, 200, 300, 400 and 433 words (the whole array) to be sorted, then prints out the average time taken for each of these.
My code, I believe, is ok. However, I'm coming across a strange anomaly which I was wondering if anybody could help me understand.
Here are the results when both of the sorts are executed once:
Here are the results when both of the sorts are executed 10,000 times:
When run once the results are I believe as expected, that is the insertion sort is faster for the lower amounts of elements sorted but the merge sort is faster for the higher amounts and the whole array.
However, when run 10,000 times, the average timings are way off, the insertion sort is massively faster for all amounts of elements sorted.
It's as if the insertion sort speeds up with each iteration, how can this be possible?
Code for both sort algorithms and method used for running multiple iterations of said sort algorithms - in comment below
Thanks for any help you can provide.
Upvotes: 1
Views: 217
Reputation: 144750
The time complexity of these algorithms is well known: O(N2) for insertion sort and O(N.log(N)) for merge sort.
Here are possible reasons for your unexpected observation:
A dataset of 400 strings is not very large, the quality of implementation may be more important than the sheer complexity of the algorithms.
your implementation of insertion sort is not very efficient but at least it operates in place, hence with an effective time complexity of O(N2). Yet you should remove the measuring code that executes every 100 elements with a non trivial complexity.
your implementation of merge sort is quite inefficient: you allocate multiple dynamic arrays one element at a time for each split and merge phase. This is very time consuming, and causes a lot of objects to be allocated and left dangling almost immediately for the garbage collector to reclaim at great expense.
A single call to merge sort might perform better than insertion sort, if the timing is meaningful at all, but many calls might trigger the garbage collector, with a substantial overhead, although your timings do not show evidence of this, potentially because 10000 iterations is not enough.
The real explanation is actually simple: since your insertion sort implementation sorts the dataset in place, it is already sorted for each subsequent call, which is the optimal case for insertion sort with linear complexity.
You should sort copies of the initial dataset for a more meaningful benchmark. And also look for a better merge sort implementation that uses a single temporary array and sorts the elements in place and avoid dynamic Arrays when the size is known in advance.
Upvotes: 2