Reputation: 21
I am stuck with a problem where I have millions of key-value pairs that I need to access using the keys randomly (not by using an iterator).
The range of keys is not known at compile time, but total number of the key-value pairs is known.
I have looked into HashMap and Hashset data structures but they are not truly O(1) as in case of collision in the hash-code they become array of LinkedLists which has linear search complexity at worst case.
I have also considered increasing the number of buckets in the HashMap but it does not ensure that every element will be stored in a separate bucket.
Is there any way to store and access millions of key-value pairs with O(1) complexity?
Ideally I would like every key to be like a variable and corresponding value should be the value assigned to that key
Thanks in advance.
Upvotes: 0
Views: 1560
Reputation: 55619
No, there isn't such a (known) data structure for generic data types.
If there were, it would most likely have replaced hash tables in most commonly-used libraries, unless there's some significant disadvantage like a massive constant factor or ridiculous memory usage, either of which would probably make it nonviable for you as well.
I said "generic data types" above, as there may be some specific special cases for which it's possible, such as when the key is a integer in a small range - in this case you could just have an array where each index corresponds to the same key, but this is also really a hash table where the key hashes to itself.
Note that you need a terrible hash function, the pathological input for your hash function, or a very undersized hash table to actually get the worst-case O(n) performance for your hash table. You really should test it and see if it's fast enough before you go in search of something else. You could also try TreeMap
, which, with its O(log n) operations, will sometimes outperform HashMap
.
Upvotes: 1
Reputation: 10994
I think you are confusing what Big O notation represents. It defines limiting behavior of a function, not necessarily actual behavior.
The average complexity of a hash map is O(1) for insert, delete, and search operations. What does this mean? In means, on average, those operations will complete in constant time regardless of the size of the hash map. So, depending on the implementation of the map, a lookup might not take exactly one step but it will most likely not involve more than a few steps, relative to the hash map's size.
How well a hash map actually behaves for those operations is determined by a few factors. The most obvious is the hash function used to bucket keys. Hash functions that distribute the computed hashes more uniformly over the hash range and limit the number of collisions are preferred. The better the hash function in those areas, the closer a hash map will actually operate in constant time.
Another factor that affects actual hash map behavior is how storage is managed. How a map resizes and repositions entries as items are added and removed helps control hash collisions by using an optimal number of buckets. Managing the hash map storage affectively will allow the hash map to operate close to constant time.
With all that said, there are ways to construct hash maps that have O(1) worst case behavior for lookups. This is accomplished using a perfect hash function. A perfect hash function is an invertible 1-1 function between keys and hashes. With a perfect hash function and the proper hash map storage, O(1) lookups can be achieved. The prerequisite for using this approach is knowing all the key values in advance so a perfect hash function can be developed.
Sadly, your case does not involve known keys so a perfect hash function can not be constructed but, the available research might help you construct a near perfect hash function for your case.
Upvotes: 2