Reputation: 749
I'm processing some generated data files (hundreds of Mbytes) which contains several G
objects. I need to random access these objects. A possible implementation, I guess, might be a big HashTable
. My program is written in Java and it seems the java.util.HashMap
cannot handle this (somehow it's extremely slow). Could anyone recommend a solution to random accessing these objects?
Upvotes: 1
Views: 287
Reputation: 533500
A few hundred MB cannot hold several billion objects unless each object is a bit (which is not really an object IMHO).
How I would approach this is to use memory mapped file to map in the contents of the data and to build your own hash table in another memory mapped file (which requires you to scan the data once to build the keys)
Depending on the layout of the data, it is worth remembering that random access is not the most efficient way to cache data i.e. your cache loaded lines of 64 bytes (depending on architecture) and if your structure doesn't fit in memory, record based tables may be more efficient.
Upvotes: 0
Reputation: 718788
If a HashMap
is extremely slow, then the two most likely causes are as follows:
The hashCode()
and/or equals(Object)
methods on your key class could be very expensive. For instance, if you use an array or a collection as a key, the hashCode()
method will access every element each time you call it, and the equals
method will do the same for equal keys.
Your key class could have a poor hashCode()
method that is giving the same value for a significant percentage of the (distinct) keys used by the program. When this occurs you get many key collisions, and that can be really bad for performance when the hash table gets large.
I suggest you look at these possibilities first ... before changing your data structure.
Note: if "several G objects" means several billion objects, then you'll have trouble holding the files' contents in memory ... unless you are running this application on a machine with 100's of gigabytes of RAM. I advise you do some "back of the envelope" calculations to see if what you are trying to do is feasible.
Upvotes: 4
Reputation: 14678
Whatever your keys are, make sure you're generating a good hash for each one via hashCode()
. A lot of times bad HashMap performance can be blamed on colliding hashes. When there's a collision, HashMap generates a linked list for the colliding objects.
Worst-case if you're returning the same hash for all objects, HashMap essentially becomes a linked list. Here's a good starting place for writing hash functions: http://www.javamex.com/tutorials/collections/hash_function_guidelines.shtml
Upvotes: 1