Reputation: 857
I have a very large hash map (2+ million entries) that is created by reading in the contents of a CSV file. Some information:
One idea I had was to change the HashMap to HashMap and use the hashCode of the String as the key, this will end up saving a bit of memory but risks issues with collisions if two strings have identical hash codes ... how likely is this for strings that are less than 20 characters long?
Does anyone else have any ideas on what to do here? The CSV file itself is only 100 MB, but java ends up using over 600MB in memory for this HashMap.
Thanks!
Upvotes: 0
Views: 4464
Reputation: 11
what you are trying to do is exactly a JOIN operation. Try considering an in-memory DB like H2 and you can achieve this by loading both CSV files to temp tables and then do a JOIN over them. And as per my experience h2 runs great with load operation and this code will certainly be faster and less memory intensive than ur manual HashMap based joining method.
Upvotes: 1
Reputation: 10714
Parse the CSV, and build a Map whose keys are your existing keys, but values are Integer pointers to locations in the files for that key.
When you want the value for a key, find the index in the map, then use a RandomAccessFile to read that line from the file. Keep the RandomAccessFile open during processing, then close it when done.
Upvotes: 1
Reputation: 12653
It sounds like you have the framework to try this already. Instead of adding the string, add the string.hashCode()
and see if you get collisions.
In terms of freeing up memory, the JVM generally doesn't get smaller, but it will garbage collect if it needs to.
Also, it sounds like you might have an algorithm that doesn't need the hash table at all. Could you describe what you're trying to do in a little more detail?
Upvotes: 1
Reputation: 128749
If performance isn't the primary concern, store the entries in a database instead. Then memory isn't a concern, and you have good, if not great, search speed thanks to the database.
Upvotes: 0