YaleBD
YaleBD

Reputation: 163

Insert/Delete in O(1) time in HashMaps with millions of objects (with distinct keys)?

I know that insert/delete works in O(1) time with Java HashMaps.

But is it still the fastest data structure if I have over a million objects (with distinct keys - i.e. each object has a unique key) in my HashMap?

Upvotes: 1

Views: 2319

Answers (2)

Stephen C
Stephen C

Reputation: 719281

TL;DR - profile your code!

The average performance of HashMap insertion and deletion scales as O(1) (assuming you have a sound hashCode() method on the keys1) until you start running into 2nd-order memory effects:

  • The larger the heap is, the longer it takes to garbage collect. Generally, the factors that impact most are the number and size of non-garbage objects. A big enough HashMap will do that ...
  • Your hardware has a limited amount of physical memory. If your JVM's memory demand grows beyond that, the host OS will "swap" memory pages between RAM and disk. A big enough HashMap will do that ... if your heap size is bigger than the amount of physical RAM available to the JVM process.
  • There are memory effects that are due to the sizes of your processors' memory cache and TLB cache sizes. Basically, if the processors "demand" in reading and writing memory is too great, the memory system becomes the bottleneck. These effects can be exacerbated by a large heap and highly non-localized access patterns. (And running the GC!)

There is also a limit of about 2^31 on the size of a HashMap's primary hash array. So if you have more than about 2^31 / 0.75 entries, the performance of the current HashMap implementation theoretically O(N). However, we are talking billions of entries, and the 2nd order memory effects will be impacting on performance well before then.


1 - If your keys have a poor hashCode() function, then you may find that you get a significant proportion of the keys hash to the same code. If that happens, lookup, insert and delete performance for those keys will be either O(logN) or O(N) ... depending on the key's type, and your Java version. In this case, N is the number keys in the table with the same hashcode as the one you are looking up, etc.


Is HashMap the fastest data structure for your use-case?

  • It is hard to say without more details of your use-case.
  • It is hard to say without understanding how much time and effort you are prepared to put into the problem. (If you put in enough coding effort, you could almost certainly trim a few percent off. Maybe a lot more. HashMap is general purpose.)
  • It is hard to say without you (first!) doing a proper performance analysis.

For example, you first need to be sure that the HashMap really is the cause of your performance problems. Sure, you >>think<< it is, but have you actually profiled your code to find out? Until you do this, you risk wasting your time on optimizing something that isn't the bottleneck.

Upvotes: 2

gwcoderguy
gwcoderguy

Reputation: 422

So HashMaps will have an O(1) insert/delete even for a huge number of objects. The problem for a huge amount of data is the space. For a million entries you may be fine with in memory.

Java has a default load factor of .75 for a HashMap, meaning that a HashMap would need 1.33 million slots to support this map. If you can support this in memory, it's all fine. Even if you can't hold this all in memory, you'd probably still want to use HashMaps, perhaps a Distributed HashMap.

As far as Big-O time goes, this refers to the worst case complexity. The only time the analysis of Big-O time is really useful is as data sizes get larger and larger. If you were working with a really small data set, O(5n+10) would not be the same as O(n). The reason that constant time ( O(1) ) time is so valuable is because it means that the time doesn't depend on the size of the data set. Therefore, for a large data set like the one you're describing, a HashMap would be an excellent option due to its constant time insert/delete.

Upvotes: 0

Related Questions