Peter
Peter

Reputation: 857

Reducing memory usage of very large HashMap

I have a very large hash map (2+ million entries) that is created by reading in the contents of a CSV file. Some information:

  1. The HashMap maps a String key (which is less than 20 chars) to a String value (which is approximately 50 characters).
  2. This HashMap is initialized with an initial capacity of 3 million so that the load factor is around .66.
  3. The HashMap is only utilized by a single operation, and once that operation is completed, I "clear()" it. (Although it doesn't appear that this clear actually clears up memory, is a separate call to System.gc() necessary?).

One idea I had was to change the HashMap to HashMap and use the hashCode of the String as the key, this will end up saving a bit of memory but risks issues with collisions if two strings have identical hash codes ... how likely is this for strings that are less than 20 characters long?

Does anyone else have any ideas on what to do here? The CSV file itself is only 100 MB, but java ends up using over 600MB in memory for this HashMap.

Thanks!

Upvotes: 0

Views: 4464

Answers (4)

Kapil Ghodawat
Kapil Ghodawat

Reputation: 11

what you are trying to do is exactly a JOIN operation. Try considering an in-memory DB like H2 and you can achieve this by loading both CSV files to temp tables and then do a JOIN over them. And as per my experience h2 runs great with load operation and this code will certainly be faster and less memory intensive than ur manual HashMap based joining method.

Upvotes: 1

Sam Barnum
Sam Barnum

Reputation: 10714

Parse the CSV, and build a Map whose keys are your existing keys, but values are Integer pointers to locations in the files for that key.

When you want the value for a key, find the index in the map, then use a RandomAccessFile to read that line from the file. Keep the RandomAccessFile open during processing, then close it when done.

Upvotes: 1

Rich
Rich

Reputation: 12653

It sounds like you have the framework to try this already. Instead of adding the string, add the string.hashCode() and see if you get collisions.

In terms of freeing up memory, the JVM generally doesn't get smaller, but it will garbage collect if it needs to.

Also, it sounds like you might have an algorithm that doesn't need the hash table at all. Could you describe what you're trying to do in a little more detail?

Upvotes: 1

Ryan Stewart
Ryan Stewart

Reputation: 128749

If performance isn't the primary concern, store the entries in a database instead. Then memory isn't a concern, and you have good, if not great, search speed thanks to the database.

Upvotes: 0

Related Questions