Reputation: 115
I use a hashmap to store a QTable for an implementation of a reinforcement learning algorithm. My hashmap should store 15000000 entries. When I ran my algorithm I saw that the memory used by the process is over 1000000K. When I calculated the memory, I would expect it to use not more than 530000K. I tried to write an example and I got the same high memory usage:
public static void main(String[] args) {
HashMap map = new HashMap<>(16_000_000, 1);
for(int i = 0; i < 15_000_000; i++){
map.put(i, i);
}
}
My memory calulation:
Each entryset is 32 bytes
Capacity is 15000000
HashMap Instance uses: 32 * SIZE + 4 * CAPACITY
memory = (15000000 * 32 + 15000000 * 4) / 1024 = 527343.75K
Where I'm wrong in my memory calculations?
Upvotes: 9
Views: 15312
Reputation: 120858
I sort of had the same requirement as you.. so decided to throw my thoughts here.
1) There is a great tool for that: jol.
2) Arrays are objects too, and every object in java has two additional headers: mark and klass, usually 4 and 8 bytes in size (this can be tweaked via compressed pointers, but not going to go into details).
3) Is is important to note about the load factor here of the map (because it influences the resize of the internal array). Here is an example:
HashMap<Integer, Integer> map = new HashMap<>(16, 1);
for (int i = 0; i < 13; ++i) {
map.put(i, i);
}
System.out.println(GraphLayout.parseInstance(map).toFootprint());
HashMap<Integer, Integer> map2 = new HashMap<>(16);
for (int i = 0; i < 13; ++i) {
map2.put(i, i);
}
System.out.println(GraphLayout.parseInstance(map2).toFootprint());
Output of this is different(only the relevant lines):
1 80 80 [Ljava.util.HashMap$Node; // first case
1 144 144 [Ljava.util.HashMap$Node; // second case
See how the size is bigger for the second case because the backing array is twice as big (32 entries). You can only put 12 entries in a 16 size array, because the default load factor is 0.75: 16 * 0.75 = 12.
Why 144? The math here is easy: an array is an object, thus: 8+4 bytes for headers. Plus 32 * 4 for references = 140 bytes. Due to memory alignment of 8 bytes, there are 4 bytes for padding resulting in a total 144 bytes.
4) entries are stored inside either a Node or a TreeNode inside the map (Node is 32 bytes and TreeNode is 56 bytes). As you use ONLY integers, you will have only Nodes, as there should be no hash collisions. There might be collisions, but this does not yet mean that a certain array entry will be converted to a TreeNode, there is a threshold for that. We can easily prove that there will be Nodes only:
public static void main(String[] args) {
Map<Integer, List<Integer>> map = IntStream.range(0, 15_000_000).boxed()
.collect(Collectors.groupingBy(WillThereBeTreeNodes::hash)); // WillThereBeTreeNodes - current class name
System.out.println(map.size());
}
private static int hash(Integer key) {
int h = 0;
return (h = key.hashCode()) ^ h >>> 16;
}
The result of this will be 15_000_000, there was no merging, thus no hash-collisions.
5) When you create Integer objects there is pool for them (ranging from -127 to 128 - this can be tweaked as well, but let's not for simplicity).
6) an Integer is an object, thus it has 12 bytes header and 4 bytes for the actual int value.
With this in mind, let's try and see the output for 15_000_000 entries (since you are using a load factor of one, there is no need to create the internal capacity of 16_000_000). It will take a lot of time, so be patient. I also gave it a
-Xmx12G and -Xms12G
HashMap<Integer, Integer> map = new HashMap<>(15_000_000, 1);
for (int i = 0; i < 15_000_000; ++i) {
map.put(i, i);
}
System.out.println(GraphLayout.parseInstance(map).toFootprint());
Here is what jol said:
java.util.HashMap@9629756d footprint:
COUNT AVG SUM DESCRIPTION
1 67108880 67108880 [Ljava.util.HashMap$Node;
29999872 16 479997952 java.lang.Integer
1 48 48 java.util.HashMap
15000000 32 480000000 java.util.HashMap$Node
44999874 1027106880 (total)
Let's start from bottom.
total size of the hashmap footprint is: 1027106880 bytes or 1 027 MB.
Node instance is the wrapper class where each entry resides. it has a size of 32 bytes; there are 15 million entries, thus the line:
15000000 32 480000000 java.util.HashMap$Node
Why 32 bytes? It stores the hashcode(4 bytes), key reference (4 bytes), value reference (4 bytes), next Node reference (4 bytes), 12 bytes header, 4 bytes padding, resulting in 32 bytes total.
1 48 48 java.util.HashMap
A single hashmap instance - 48 bytes for it's internals.
If you really want to know why 48 bytes:
System.out.println(ClassLayout.parseClass(HashMap.class).toPrintable());
java.util.HashMap object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 12 (object header) N/A
12 4 Set AbstractMap.keySet N/A
16 4 Collection AbstractMap.values N/A
20 4 int HashMap.size N/A
24 4 int HashMap.modCount N/A
28 4 int HashMap.threshold N/A
32 4 float HashMap.loadFactor N/A
36 4 Node[] HashMap.table N/A
40 4 Set HashMap.entrySet N/A
44 4 (loss due to the next object alignment)
Instance size: 48 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Next the Integer instances:
29999872 16 479997952 java.lang.Integer
30 million integer objects (minus 128 that are cached in the pool)
1 67108880 67108880 [Ljava.util.HashMap$Node;
we have 15_000_000 entries, but the internal array of a HashMap is a power of two size, that's 16,777,216 references of 4 bytes each.
16_777_216 * 4 = 67_108_864 + 12 bytes header + 4 padding = 67108880
Upvotes: 4
Reputation: 298153
Well, in the best case, we assume a word size of 32 bits/4 bytes (with CompressedOops and CompressedClassesPointers). Then, a map entry consists of two words JVM overhead (klass pointer and mark word), key, value, hashcode and next pointer, making 6 words total, in other words, 24 bytes. So having 15,000,000 entry instances will consume 360 MB.
Additionally, there’s the array holding the entries. The HashMap
uses capacities that are a power of two, so for 15,000,000 entries, the array size is at least 16,777,216, consuming 64 MiB.
Then, you have 30,000,000 Integer
instances. The problem is that map.put(i, i)
performs two boxing operations and while the JVM is encouraged to reuse objects when boxing, it is not required to do so and reusing won’t happen in your simple program that is likely to complete before the optimizer ever interferes.
To be precise, the first 128 Integer
instances are reused, because for values in the -128 … +127
range, sharing is mandatory, but the implementation does this by initializing the entire cache on the first use, so for the first 128
iterations, it doesn’t create two instances, but the cache consists of 256
instances, which is twice that number, so we end up again with 30,000,000 Integer
instances total. An Integer
instance consist of at least the two JVM specific words and the actual int
value, which would make 12 bytes, but due to the default alignment, the actually consumed memory will be 16 bytes, dividable by eight.
So the 30,000,000 created Integer
instances consume 480 MB.
This makes a total of 360 MB + 64 MiB + 480 MB, which is more than 900 MB, making a heap size of 1 GB entirely plausible.
But that’s what profiling tools are for. After running your program, I got
Note that this tool only reports the used size of the objects, i.e. the 12 bytes for an Integer
object without considering the padding that you will notice when looking at the total memory allocated by the JVM.
Upvotes: 9