Reputation: 21
I have read that searching time in TreeSet is of the order of log(N) and in HashSet is of the order of 1 (Constant). Now, to test this in a program I have written a Java program.
import java.util.TreeSet;
import java.util.Random;
public class SearchSpeed {
int totalElements, target;
Set<Integer> hashSet, treeSet;
Random r;
// parameterized constructor
SearchSpeed(int totalElements){
this.totalElements = totalElements;
r = new Random();
hashSet = new HashSet<>();
treeSet = new TreeSet<>();
}
// populating the sets and selecting the target element
void populateSets(){
hashSet.clear();
treeSet.clear();
int num, count = 0;
while(hashSet.size() != totalElements){
num = r.nextInt();
hashSet.add(num);
treeSet.add(num);
count++;
if(count == 5000){ // picking the 5000th element as the target element
target = num;
}
}
}
// searching the target element in HashSet
long searchHash(){
long initialTime = System.nanoTime();
hashSet.contains(target);
long finalTime = System.nanoTime();
return finalTime - initialTime;
}
// searching the target element in TreeSet
long searchTree(){
long initialTime = System.nanoTime();
treeSet.contains(target);
long finalTime = System.nanoTime();
return finalTime - initialTime;
}
public static void main(String[] args) {
SearchSpeed ss = new SearchSpeed(10000);
long hashTime = 0, treeTime = 0;
for(int i = 0; i < 1000; i++){
ss.populateSets();
hashTime += ss.searchHash();
treeTime += ss.searchTree();
}
double avgTreeTime = treeTime/1000.0;
double avgHashTime = hashTime/1000.0;
System.out.printf("Time taken by TreeSet MINUS Time Taken by HashSet = %.4f nanoseconds\n",avgTreeTime - avgHashTime);
}
}
After running this program several times, these are the results I got:
Output 1: Time taken by TreeSet MINUS Time Taken by HashSet = 105.6000 nanoseconds
Output 2: Time taken by TreeSet MINUS Time Taken by HashSet = 99.1000 nanoseconds
Output 3: Time taken by TreeSet MINUS Time Taken by HashSet = 36.8000 nanoseconds
Output 4: Time taken by TreeSet MINUS Time Taken by HashSet = 50.6000 nanoseconds
Output 5: Time taken by TreeSet MINUS Time Taken by HashSet = -30.4000 nanoseconds
I was expecting the time difference to be somewhat a larger value but, in some cases, the difference of time is as low as 36 nanoseconds and even negative in some cases.
Is this what I should be expecting? or have I missed something while writing the program?
Upvotes: 1
Views: 59
Reputation: 103637
Yes. You've missed a lot. So much, in fact, it's almost impossible for you to this right. Good news though, there's a library for that. Specifically, JMH.
The JVM is not a simple thing. It runs code slowly and does all sorts of seemingly pointless bookkeeping so that it knows which 0.1% of the code spends 95% of the CPU resources (those aren't exaggerated numbers; that's what most apps end up doing. An academic case that just tests the speed of some algorithm obviously isn't like that, but the JVM is optimized for real life apps, not test cases). Once it figures out what that is, it uses all that seemingly pointless bookkeeping to craft a highly optimized bit of machine code that e.g. is written to optimize branching (CPUs tend to be faster at 'do not jump' then 'do jump', which is what any loop structure is translated to. The JVM knows which 'path' is, so far, taken more often and optimizes accordingly. Compile-time only optimizers can't do that without hints).
That's one of about 900 reasons why you can't just rely on calculating the delta between 2 nanoTime() calls and extract meaningful conclusions about timing.
JMH fixes some of this. Not all, but most. Follow the tutorial linked above, hang your benchmark in a JMH harness, and then check your results.
Upvotes: 0