Georg
Georg

Reputation: 982

Big byte arrays use more heap than expected

Using a 1 GB Java heap (-Xmx1g), I store data in many large byte arrays. I get an OutOfMemoryError quite some time before 1 GB of data is stored. At that time, there is still quite a lot of free heap as calculated by Runtime rt.maxMemory() - rt.totalMemory() + rt.freeMemory():

byte array size approx. data that can be stored approx. free heap shown
2^18 (262144) 800 MB 270 MB
2^17 (131072) 930 MB 140 MB
2^16 (65536) 997 MB 72 MB
2^15 (32768) 1032 MB 36 MB

Why is heap size calculation off for large byte arrays and can I do something to mend it?


Note: When using 2^19 (or larger) sized byte arrays, a different thing occurs: Java byte array of 1 MB or more takes up twice the RAM - let's focus this question on 2^18 sized byte arrays.

Run with 64-Bit Server VM AdoptOpenJDK 11.0.11, both on Windows java -cp .\lib\* -Xmx1g tryit.Main and Debian java -cp .:./lib/* -Xmx1g tryit.Main:

package tryit;

public class Main {
    public static void main(String[] args) throws Exception {
        byte[][] array = new byte[1000000][];
        long freeAtStart = free();
        System.out.println("Free at start: " + freeAtStart);
        int chunkSize = 2<<17; // This is 2^18.
        System.out.println("Chunk size   : " + chunkSize);
        for (int n = 0; n < 1000000; n++) {
            if (n % 50 == 0) {
                long currentFree = free();
                System.out.printf("%d: stored %d / allocated %d / free %d\n", n, n * chunkSize, freeAtStart - currentFree, currentFree);
            }
            array[n] = new byte[chunkSize];
        }
    }
    static long free() throws Exception {
        System.gc(); // Called just in case - there should not be anything to garbage collect.
        Thread.sleep(100); // Give GC some time to work
        return Runtime.getRuntime().maxMemory() - Runtime.getRuntime().totalMemory() + Runtime.getRuntime().freeMemory();
    }
}

And finally the (shortened) output of four runs:

2^15:
Free at start: 1068751960 / Chunk size: 32768
31500: stored 1032192000 / allocated 1032933912 / free 35818048

2^16:
Free at start: 1068751960 / Chunk size: 65536
15200: stored 996147200 / allocated 996627400 / free 72124560

2^17:
Free at start: 1068751960 / Chunk size: 131072
7100: stored 930611200 / allocated 930960032 / free 137791928

2^18:
Free at start: 1068751960 / Chunk size: 262144
3050: stored 799539200 / allocated 799823160 / free 268928800

2^19 (humongous objects - allocation size is two times stored size):
Free at start: 1068751960 / Chunk size: 524288
1000: stored 524288000 / allocated 1048811120 / free 19940840

Upvotes: 2

Views: 1360

Answers (1)

Thomas Kl&#228;ger
Thomas Kl&#228;ger

Reputation: 21435

As described in the linked answer (Java byte array of 1 MB or more takes up twice the RAM) and the G1 garbage collector documentation the G1 garbage collector divides the heap into regions of 1 MByte (2^20 bytes) each. For a 1GB heap that gives 1024 regions (probably a bit less due to administrative overhead).

Naively you would expect the a region of 2^20 bytes can hold 4 byte arrays of 2^18 bytes each - but unfortunately this is not the case. Byte arrays are objects, and objects have a hidden object header (see https://stackoverflow.com/a/50509263 for an explanation).

So the effective size of a byte[262144] is not 262144 bytes, it is 262160 bytes (depending on the JVM and the maximum heap size it could be even more), which means that each region can only hold 3 byte arrays of length 262144.

Combining 3 byte arrays per region with 1024 regions would give you a maximum of 3072 byte arrays of 262144 bytes for a 1 GB heap, which matches nicely with your numbers.


What you can do about it:

  • use larger regions (by supplying -XX:G1HeapRegionSize=4M) - a 4MB region can hold 15 byte arrays of length 262144, whereas 4 regions of 1MB can hold only 12 byte arrays of length 262144
  • use slightly smaller byte arrays - a 1MB region can hold only 3 byte arrays of length 262144, but 4 byte arrays of length 262128

Note: this post uses 2^20 to mean two to the power of twenty, which is not the same as the java expression 2^20, but rather 1<<20

Upvotes: 5

Related Questions