Reputation: 427
I know that we prefer ArrayList
over HashSet
when we need to store duplicates, and HashSet
uses hashCode()
function to calculate an index for each element in its array.
So, this means if we want to store a single element, then ArrayList
should take less time than HashSet
. Please correct me if I am wrong anywhere.
But when I am checking performance through code I am getting different behavior.
Case 1:
import java.util.*;
class HashsetVSArraylist
{
public static void main(String args[])
{
ArrayList<Integer> a1=new ArrayList<Integer>();
long nanos = System.nanoTime();
a1.add(1);
System.out.println("ArrayList Time:"+(System.nanoTime()-nanos)+"ns");
HashSet<Integer> h1=new HashSet<Integer>();
nanos = System.nanoTime();
h1.add(2);
System.out.println("HashSet Insertion Time:"+(System.nanoTime()-nanos)+"ns");
}
}
Output:
ArrayList Time:495087ns
HashSet Insertion Time:21757ns
Case 2:
import java.util.*;
class HashsetVSArraylist
{
public static void main(String args[])
{
HashSet<Integer> h1=new HashSet<Integer>();
long nanos = System.nanoTime();
h1.add(2);
System.out.println("HashSet Insertion Time:"+(System.nanoTime()-nanos)+"ns");
ArrayList<Integer> a1=new ArrayList<Integer>();
nanos = System.nanoTime();
a1.add(1);
System.out.println("ArrayList Time:"+(System.nanoTime()-nanos)+"ns");
}
}
Output:
HashSet Insertion Time:582527ns
ArrayList Time:21758ns
Now, I assume HashSet
should take more time for insertion of a single element. But, in both cases behavior is different...less time is taken for the data structure which comes second in the code. Also, behavior changes when the number of elements inserted is more than 1000.
Please explain what is happening.
Upvotes: 1
Views: 803
Reputation: 50061
Your benchmark is broken. Read: Dynamic compilation and performance measurement and: Anatomy of a flawed microbenchmark before trying to benchmark in Java.
The short explanation is that the total time duration you are trying to measure there is much, much too short, and the benchmark results will be swamped by tiny OS and CPU details, and by the fact that the Java VM is still busy compiling the bytecode to machine code while it begins to run.
Meanwhile, it is a bit mad to compare ArrayList and HashList on performance when they serve two different purposes, but all else being equal, the implementation of ArrayList is significantly simpler, so your assumption is surely correct; it will be faster.
Upvotes: 2
Reputation: 471
The true issue here is hidden by the auto-boxing that Java is doing in converting your integer primitive into an Integer object.
When you call a1.add(1) this is actually calling a1.add(Integer.valueOf(1))
The first time you reference the static valueOf method of the Integer class, that is causing the execution of the static initializer in the Integer class, which creates hundreds of static objects and on your system is taking about 500ms.
Even with that, there are a lot of other things going on behind the scenes that interfere with this test such as other static initializers, dynamic memory allocation, system resource allocation, and a myriad of others.
If you can design a test that eliminates or minimizes those variables, then you will find that adds to ArrayList is always faster than HashSet over the long run, but not for any given insertion. Luckily it should never be the case that we care about the speed of a single insertion.
For example imagine the nearly worst case scenario of trying to add a value to an ArrayList, but that ArrayList is at it's maximum allocated size. The ArrayList tries to allocate more room, but the system has hit it's current memory allocation cap so it needs to wait for the VM to allocate more memory with the system. Simultaneously the garbage collector gets kicked off. In this case an insert which may take <1ms normally can take multiple seconds to execute.
Upvotes: 1