Niklas Mertsch
Niklas Mertsch

Reputation: 1489

Java-Collection: What collection to use is this very specific case?

I have the following scenario and am looking for the "best" implementation:

  1. I want to store items in a java.util.Collection to implement an interface
  2. All items are guaranteed to have a unique hashCode
  3. I know the max number n of items to be stored (max capacity is known on initialization)
  4. The hashCode is between 0 to n
  5. Order is not important, duplicates are not wanted (Set-properties are desired)
  6. Items can be added, but will never be removed
  7. Performance of contains is very important (desired: O(1), at least O(log_n))

My first thought was to use a new HashSet<item>(n+1, 1.0), but after some reading I found that it applies an internal hash function to the hashCode of the item, so hash collisions will still occur, even though the hashCodes are unique and hachCode <= n.

My second thought was to use a native array (new item[n]) and use the hashCode as index. This seems to be the implementation with the best performance, but my interface expects a java.util.Collection and the collection will be used with contains and add, which is not compatible with the benefits of this second approach.

Am I missing something, or do I have to accept the overhead and collisions of the HashSet to get the best performance?

Upvotes: 2

Views: 72

Answers (2)

Naim
Naim

Reputation: 86

Given the requirement, I suggest you to use HashSet. It will perform better. As you mentioned that the maximum number of elements and the maximum capacity of Set, both are of same size. Further each element have an unique hashcode. In that case, there is rare change of hash key collision. So dont worry about the contains function. With the mentioned dataset the contains will perform in O(1). Similarly the add will also perform in constant order. i.e. O(1).

Upvotes: 0

Eran
Eran

Reputation: 394146

Using HashSet will still give you good performance, but given the specific requirements you describe (and assuming n it not too large), you can create you own "ArraySet" implementation of the Set interface:

  • It will have a backing array of length n+1 to store the data.
  • contains will use the hashCode of an element to find if the index matching that hashCode() has a non null value.
  • add will use the hashsCode of the added element to find the index of the array to which you should add the element.
  • Any other required methods will be implemented similarly.

This solution may be slightly more efficient than HashSet, since it contains less overhead. It will, however, be expansive in terms of memory usage if n is large.

Upvotes: 4

Related Questions