Reputation: 1489
I have the following scenario and am looking for the "best" implementation:
java.util.Collection
to implement an interfacehashCode
n
of items to be stored (max capacity
is known on initialization)hashCode
is between 0
to n
Set
-properties are desired)contains
is very important (desired: O(1)
, at least O(log_n)
)My first thought was to use a new HashSet<item>(n+1, 1.0)
, but after some reading I found that it applies an internal hash function to the hashCode
of the item, so hash collisions will still occur, even though the hashCodes
are unique and hachCode <= n
.
My second thought was to use a native array (new item[n]
) and use the hashCode
as index. This seems to be the implementation with the best performance, but my interface expects a java.util.Collection
and the collection will be used with contains
and add
, which is not compatible with the benefits of this second approach.
Am I missing something, or do I have to accept the overhead and collisions of the HashSet
to get the best performance?
Upvotes: 2
Views: 72
Reputation: 86
Given the requirement, I suggest you to use HashSet. It will perform better. As you mentioned that the maximum number of elements and the maximum capacity of Set, both are of same size. Further each element have an unique hashcode. In that case, there is rare change of hash key collision. So dont worry about the contains function. With the mentioned dataset the contains will perform in O(1). Similarly the add will also perform in constant order. i.e. O(1).
Upvotes: 0
Reputation: 394146
Using HashSet
will still give you good performance, but given the specific requirements you describe (and assuming n
it not too large), you can create you own "ArraySet" implementation of the Set
interface:
n+1
to store the data.contains
will use the hashCode
of an element to find if the index matching that hashCode()
has a non null value.add
will use the hashsCode
of the added element to find the index of the array to which you should add the element.This solution may be slightly more efficient than HashSet
, since it contains less overhead. It will, however, be expansive in terms of memory usage if n
is large.
Upvotes: 4