Reputation: 1834
I have a dictionary as a text file mapping from 2M
words to 50k
words. I load this file into memory as HashMap<String, String>
by reading the file line by line, splitting on a separator and invoking myMap.put(line[0], line[1])
. The size of the text file is 45MB
, while the HashMap uses 350MB
of the heap. My goal is to reduce memory usage without harming lookup speed.
myMap.values().size()
returns 2M
instead of 50k
, suggesting that the values are stored as duplicates. Is there a way to make identical values point to the same String object?
Map<String, String> dict = new HashMap<>();
try (FileReader fr = new FileReader(FILE);
BufferedReader br = new BufferedReader(fr)) {
String line;
while ((line = br.readLine()) != null) {
String key_value[] = line.split(":");
dict.put(key_value[0], key_value[1].intern());
}
} catch (Exception e) {
e.printStackTrace();
}
Upvotes: 1
Views: 1985
Reputation: 1116
You can use String.intern()
on the values to make them all point to the same instance. But this has other problems like using the PermGenSpace, which is not garbage collected pre-Java 1.7.
You would call it like this: myMap.put(line[0], line[1].intern())
.
Maybe a map based on a trie is more efficient, but I haven't used that, yet. Also depends on the nature of your Strings. The more alike your keys are, the more space the trie can save.
http://code.google.com/p/trie-map/
Also see Dukeling's answer concerning keys().size()
and values().size()
and the use of another map to avoid duplicate values.
Upvotes: 2
Reputation: 55639
Regardless of whether or not duplicates point to the same objects, there will still need to be references to these objects, so size
should still return the size with duplicates included.
A simple example showing this.
If you want duplicates to point to the same objects, you'll have to do this outside of the HashMap
or hope the optimizer takes care of it.
Alternatives to String.intern()
as joe776 suggested are possibly with a self-written collection extending some Set
(since Set
doesn't have a Object get(Object)
method) or another HashMap
(having objects point to themselves) that allows you to get a reference to the common object.
Upvotes: 5