Reputation: 641
I'm making a list of 10000, 20000, 40000, 80000, and 160000 students and quicksorting them based on their grade (The students that have the same grade will be sorted by studentnumber). At the same time I am running a stopwatch to see the excecution time. I am using Sedgewick his quicksort: http://algs4.cs.princeton.edu/23quicksort/
When I print out the students they all get sorted perfectly, but I noticed something weird in the excecution time:
10000 0.015
20000 0.019
40000 0.096
80000 0.039
160000 0.109
For example 40.000 gets a high peak and then falls again at an even bigger amount. When running several tests this behaviour is consistent and sometimes the peak is at 20.000. Is this the cause of unlucky partitioning at those amounts? I am also curious on how to calculate the big O by excecution time.
Upvotes: 1
Views: 88
Reputation: 372814
I noticed that you're using Java as your programming language. It's tough to get reliable timing numbers in Java because the JVM does a lot of work beyond just running code - it will optimize the code (often doing just-in-time compilation or optimization for code that's run frequently), perform garbage collection (which can mess up timing), etc.
Without seeing your code I can't say anything for certain, but I think that it might be a good idea to run the code multiple times in a tight loop and to compute the median runtime. The median is less sensitive to outliers than the mean and should ignore any unusually high runtimes due to garbage collection, etc. You may also want to consider looking into a high-quality Java profiler to get a more accurate estimate of the time.
As to your second question - how do you compute big-O time complexity from runtime? - the answer is that it's tricky. You can empirically estimate the time complexity of your code by looking at how it grows as a function of time, often looking at the ratio between runtimes for different sizes. There are some tools that can do this for you automatically, like this tool for finding empirical computational complexity. You should be careful of results like these, though, because they won't necessarily give you the correct answer. For example, the simplex method for solving linear programs runs in worst-case exponential time, but it's extremely difficult to trigger that behavior. Just looking at the empirical behavior might cause you to miss that. Similarly, quicksort can degenerate to O(n2) on some inputs, even if randomization is used, but it can be really hard to detect that unless you run the algorithm on a crazy huge number of inputs.
Upvotes: 4