Reputation: 163
I know that insert/delete works in O(1) time with Java HashMaps.
But is it still the fastest data structure if I have over a million objects (with distinct keys - i.e. each object has a unique key) in my HashMap?
Upvotes: 1
Views: 2319
Reputation: 719281
TL;DR - profile your code!
The average performance of HashMap
insertion and deletion scales as O(1)
(assuming you have a sound hashCode() method on the keys1) until you start running into 2nd-order memory effects:
HashMap
will do that ...HashMap
will do that ... if your heap size is bigger than the amount of physical RAM available to the JVM process.There is also a limit of about 2^31 on the size of a HashMap
's primary hash array. So if you have more than about 2^31 / 0.75 entries, the performance of the current HashMap
implementation theoretically O(N)
. However, we are talking billions of entries, and the 2nd order memory effects will be impacting on performance well before then.
1 - If your keys have a poor hashCode()
function, then you may find that you get a significant proportion of the keys hash to the same code. If that happens, lookup, insert and delete performance for those keys will be either O(logN)
or O(N)
... depending on the key's type, and your Java version. In this case, N
is the number keys in the table with the same hashcode as the one you are looking up, etc.
Is HashMap
the fastest data structure for your use-case?
HashMap
is general purpose.)For example, you first need to be sure that the HashMap
really is the cause of your performance problems. Sure, you >>think<< it is, but have you actually profiled your code to find out? Until you do this, you risk wasting your time on optimizing something that isn't the bottleneck.
Upvotes: 2
Reputation: 422
So HashMaps will have an O(1) insert/delete even for a huge number of objects. The problem for a huge amount of data is the space. For a million entries you may be fine with in memory.
Java has a default load factor of .75 for a HashMap, meaning that a HashMap would need 1.33 million slots to support this map. If you can support this in memory, it's all fine. Even if you can't hold this all in memory, you'd probably still want to use HashMaps, perhaps a Distributed HashMap.
As far as Big-O time goes, this refers to the worst case complexity. The only time the analysis of Big-O time is really useful is as data sizes get larger and larger. If you were working with a really small data set, O(5n+10) would not be the same as O(n). The reason that constant time ( O(1) ) time is so valuable is because it means that the time doesn't depend on the size of the data set. Therefore, for a large data set like the one you're describing, a HashMap would be an excellent option due to its constant time insert/delete.
Upvotes: 0