Xellos
Xellos

Reputation: 349

Memory leak in unknown module reported with JNA and JUnit

This is a specific issue I found while making MRE for my earlier question. It's not clear if it's the same issue, even though it seems related, so I'm asking it as a separate question.

Consider the following JUnit test class:

import com.sun.jna.Memory;
import com.sun.jna.Pointer;
import java.lang.Integer;
import java.util.ArrayList;
import java.util.List;
import org.junit.jupiter.api.Test;

public class MyTest {
    private void dummyTest() {
        new Memory(1024 * 9);
    }

    @Test
    public void find() {
        List<Integer> list = new ArrayList<>();
        list.add(new Integer(1));

        list.stream().map(arg -> {
            Pointer ptr = new Memory(1024 * 1024);
            return ptr;
        }).toArray(Pointer[]::new);
    }

    @Test
    public void test001() {
        dummyTest();
    }

// [91 more identical tests named test002...test092 follow]

I run it on Docker Ubuntu 24.04 with packages openjdk-8-jdk clang-17 using Gradle, with address sanitizer preloaded. JNA 5.12.1, JUnit 5.8.2, host system is an up-to-date Arch Linux. Script to run it:

echo -e "leak:libjvm\nleak:libjli\nleak:libz\nleak:liblcms\nleak:liblcms2\nleak:libjavalcms\nleak:libawt\n" >> lsan.supp
export LD_PRELOAD="/usr/lib/llvm-17/lib/clang/17/lib/linux/libclang_rt.asan-x86_64.so"
export ASAN_OPTIONS="handle_segv=0"
export LSAN_OPTIONS="suppressions="$(pwd)"/lsan.supp:print_suppressions=0"

./gradlew clean test

It consistently produces the following leak report:

Direct leak of 1048576 byte(s) in 1 object(s) allocated from:
    #0 0x759245cfb372 in malloc (/usr/lib/llvm-17/lib/clang/17/lib/linux/libclang_rt.asan-x86_64.so+0xfb372) (BuildId: 91f375f2a48c6b133a56d8cc059d017ae5de4982)
    #1 0x7592316185e6  (<unknown module>)
    #2 0x759231607bcf  (<unknown module>)
    #3 0x759231607bcf  (<unknown module>)
    #4 0x7592316080f5  (<unknown module>)
    #5 0x759231607e3f  (<unknown module>)
    #6 0x75923197befb  (<unknown module>)

SUMMARY: AddressSanitizer: 1048576 byte(s) leaked in 1 allocation(s).

This clearly refers to the native memory allocated on line 19. Since an instance of Memory should free the native memory on gc, I don't think it's a real leak - rather, it looks like JNA getting unloaded before that happens. However:

The malloc'd native memory is never touched in my code, so I'm considering the possibility of strange behaviour related to memory overcommit.

Changing the order of tests (by changing their names since order is determined by JUnit) shows that this report appears when "find" test runs after the "dummy" tests. When "find" runs before all "dummy", checking output without suppressions reveals a different report for the same allocation:

Direct leak of 1048576 byte(s) in 1 object(s) allocated from:
    #0 0x7ec3f62fb372 in malloc (/usr/lib/llvm-17/lib/clang/17/lib/linux/libclang_rt.asan-x86_64.so+0xfb372) (BuildId: 91f375f2a48c6b133a56d8cc059d017ae5de4982)
    #1 0x7ec3e1c185e6  (<unknown module>)
    #2 0x7ec3e1c07bcf  (<unknown module>)
    #3 0x7ec3e1c07bcf  (<unknown module>)
    #4 0x7ec3e1c080f5  (<unknown module>)
    #5 0x7ec3e1c07e3f  (<unknown module>)
    #6 0x7ec3e1f75ebb  (<unknown module>)
    #7 0x7ec3e1fa91ab  (<unknown module>)
    #8 0x7ec3e1c07e3f  (<unknown module>)
    #9 0x7ec3e1c07e3f  (<unknown module>)
    #10 0x7ec3e1c07f26  (<unknown module>)
    #11 0x7ec3e1c004e6  (<unknown module>)
    #12 0x7ec3f1eb2884  (/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so+0x6b2884) (BuildId: d38655f827ac6a65fdbe1898b1278717fdb89a4e)
    #13 0x7ec3f221ac0b  (/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so+0xa1ac0b) (BuildId: d38655f827ac6a65fdbe1898b1278717fdb89a4e)
    #14 0x7ec3f221c166  (/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so+0xa1c166) (BuildId: d38655f827ac6a65fdbe1898b1278717fdb89a4e)
    #15 0x7ec3f1f66e83 in JVM_InvokeMethod (/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so+0x766e83) (BuildId: d38655f827ac6a65fdbe1898b1278717fdb89a4e)
    #16 0x7ec3e1c185e6  (<unknown module>)
    #17 0x7ec3e1c07e3f  (<unknown module>)
    #18 0x7ec3e1c07e3f  (<unknown module>)
    #19 0x7ec3e1c07f26  (<unknown module>)
    #20 0x7ec3e1c07e3f  (<unknown module>)
    #21 0x7ec3e1c07e3f  (<unknown module>)
    #22 0x7ec3e1c07f26  (<unknown module>)
    #23 0x7ec3e1c07f26  (<unknown module>)
    #24 0x7ec3e1c07e3f  (<unknown module>)
    #25 0x7ec3e1c07e3f  (<unknown module>)
    #26 0x7ec3e1c0813a  (<unknown module>)
    #27 0x7ec3e1c0813a  (<unknown module>)
    #28 0x7ec3e1c07e3f  (<unknown module>)
    #29 0x7ec3e1c07f26  (<unknown module>)

There are many other (most likely false positive due to GC) leaks reported, but in my experiments, none of them ever exhibit this behaviour where depending on order of tests, only asan and <unknown module>s appear, without any known Java library - although it could just be possible but very rare for other leak reports. In different experiments, they have different number of bytes allocated in different number of allocations, but there's always at least one Java library (typically libjvm.so) reported.

Despite ASLR, it's possible to track same calls in the trace by least significant bytes of addresses. In the leak reports shown here and all others corresponding to the same allocation in different experiments/runs, addresses #0-#5 always end with the same bytes, but for address #6, they vary.

I made a full repo to conveniently reproduce this error by calling run.sh. All relevant information and code is copied to this question.

My question here is: what exactly causes this specific leak report to contain only asan and <unknown module>s? To be clear, I don't care here if <unknown module> appears sometimes, as long as there's always at least one known library (non-asan) that I can use to suppress.

Upvotes: 0

Views: 168

Answers (1)

Daniel Widdis
Daniel Widdis

Reputation: 9131

You've somewhat answered your own question with this statement:

I don't think it's a real leak - rather, it looks like JNA getting unloaded before that happens.

Your intuition is correct here. You've got a race condition between JVM shutdown and releasing your allocations. In the particular allocation you're reporting is problematic has multiple objects interacting with the Memory:

  • you create a List and add the allocation to that list
  • then you create a Pointer[] array from the list.

The Memory object itself now is referenced by two other objects. When GC runs it first has to find both the List and the array and mark them unreachable, before it gets around to marking the Memory unreachable. Following this, even when the Java object is collected, it takes additional time for a (low priority) finalizer or cleaner thread to work through its queue. Your question indicates differing behavior based on the order of the tests, which gives additional evidence that this is simply a matter of not waiting long enough.

You've written a test that relies on the JVM to clean up, but memory cleanup in the JVM isn't necessarily prioritized.

  1. Garbage Collection is not guaranteed to run, and in fact if you start with a low amount of heap consumed by your program, it may not run until you have consumed enough heap for it to consider pausing your test's execution.
  2. Even when GC does run, it tends to prioritize more recently collected objects, so when you run find() first it's at a lower priority.
  3. Depending on which GC you're using it may handle all this differently, prioritizing reducing memory or reducing pause times, etc.

In the leak reports shown here and all others corresponding to the same allocation in different experiments/runs, addresses #0-#5 always end with the same bytes, but for address #6, they vary.

We'd need more details to be certain, but it is likely addresses 0-5 are associated with a shared library loaded in memory somewhere, that doesn't move around, while further addresses in the stack trace represent different libraries not pre-loaded, or new allocations, etc.

My question here is: what exactly causes this specific leak report to contain only asan and <unknown module>s?

These are likely the last native memory cleanup steps, using JNI or other internal native JNA calls. The report you are seeing occurs at JVM shutdown so conceivably it does allow for the Java object cleanup, but not necessarily the full completion of currently executing native routines; in this case, the ones called last.

You didn't ask how to "fix" all this, but since JNA 5.12 the Memory class is Closeable. If you're not trying to reproduce a leak report, you could just wrap your memory allocations in a try-with-resources block. This would free the native allocations independently of the JVM's GC, and I supect eliminate your reports.

Including some cleanup step in your JUnit test, even if just a few millisecond delay, would likely help.

Upvotes: 1

Related Questions