dimo414
dimo414

Reputation: 48804

Prevent Java 7 from premature GC

Similar to Can JIT be prevented from optimising away method calls? I'm attempting to track memory usage of long-lived data store objects, however I'm finding that if I initialize a store, log the system memory, then initialize another store, sometimes the compiler (presumably the JIT) is smart enough to notice that these objects are no longer needed.

public class MemTest {
    public static void main(String[] args) {
       logMemory("Initial State");
       MemoryHog mh = new MemoryHog();
       logMemory("Built MemoryHog");
       MemoryHog mh2 = new MemoryHog();
       logMemory("Built Second MemoryHog"); // by here, mh may be GCed
    }
}

Now the suggestion in the linked thread is to keep a pointer to these objects, but the GC appears to be smart enough to tell that the objects aren't used by main() anymore. I could add a call to these objects after the last logMemory() call, but that's a rather manual solution - every time I test an object, I have to do some sort of side-effect triggering call after the final logMemory() call, or I may get inconsistent results.

I'm looking for general case solutions; I understand that adding a call like System.out.println(mh.hashCode()+mh2.hashCode()) at the end of the main() method would be sufficient, but I dislike this for several reasons. First, it introduces an external dependency on the testing above - if the SOUT call is removed, the behavior of the JVM during the memory logging calls may change. Second, it's prone to user-error; if the objects being tested above change, or new ones are added, the user must remember to manually update this SOUT call as well, or they'll introduce difficult to detect inconsistencies in their test. Finally, I dislike that this solution prints at all - it seems like an unnecessary hack that I can avoid with a better understanding of the JIT's optimizations. To the last point, Patricia Shanahan's answer offers a reasonable solution (explicitly print that the output is for memory sanity purposes) but I'd still like to avoid it if possible.

So my initial solution is to store these objects in a static list, and then iterate over them in the main class's finalize method*, like so:

public class MemTest {
    private static ArrayList<Object> objectHolder = new ArrayList<>();

    public static void main(String[] args) {
       logMemory("Initial State", null);
       MemoryHog mh = new MemoryHog();
       logMemory("Built MemoryHog", mh); // adds mh to objectHolder
       MemoryHog mh2 = new MemoryHog();
       logMemory("Built Second MemoryHog", mh2); // adds mh2 to objectHolder
    }

    protected void finalize() throws Throwable {
        for(Object o : objectHolder) {
            o.hashCode();
        }
    }
}

But now I've only offloaded the problem one step - what if the JIT optimizes away the loop in the finalize method, and decides these objects don't need to be saved? Admittedly, maybe simply holding the objects in the main class is enough for Java 7, but unless it's documented that the finalzie method can't be optimized away, there's still nothing theoretically preventing the JIT/GC from getting rid of these objects early, since there's no side effects in the contents of my finalize method.

One possibility would be to change the finalize method to:

protected void finalize() throws Throwable {
    int codes = 0;
    for(Object o : loggedObjects) {
        codes += o.hashCode();
    }
    System.out.println(codes);
}

As I understand it (and I could be wrong here), calling System.out.println() will prevent the JIT from getting rid of this code, since it's a method with external side effects, so even though it doesn't impact the program, it can't be removed. This is promising, but I don't really want some sort of gibberish being output if I can help it. The fact that the JIT can't (or shouldn't!) optimize away System.out.println() calls suggests to me that the JIT has a notion of side effects, and if I can tell it this finalize block has such side effects, it should never optimize it away.

So my questions:

*Some quick testing confirms, as I suspected, that the JVM doesn't generally run the main class's finalize method, it abruptly exits. The JIT/GC may still not be smart enough to GC my objects simply because the finalize method exists, even if it doesn't get run, but I'm not confident that's always the case. If it's not documented behavior, I can't reasonably trust it will remain true, even if it's true now.

Upvotes: 1

Views: 196

Answers (2)

Stephen C
Stephen C

Reputation: 718758

Yes, it would be legal for mh1 to be garbage collected at that point. At that point, there is no code that could possibly use the variable. If the JVM could detect this, then the corresponding MemoryHog object will be treated as unreachable ... if the GC were to run at that point.

A later call like System.out.println(mh1) would be sufficient to inhibit collection of the object. So would using it in a "computation"; e.g.

    if (mh1 == mh2) { System.out.println("the sky is falling!"); }

Is holding a list of objects in the main class enough to prevent them from ever being GCed?

It depends on where the list is declared. If the list was a local variable, and it became unreachable before mh1, then putting the object into the list will make no difference.

Is looping over those objects and calling something trivial like .hashCode() in the finalize method enough?

By the time the finalize method is called, the GC has already decided that the object is unreachable. The only way that the finalize method could prevent the object being deleted would be to add it to some other (reachable) data structure or assign it to a (reachable) variable.

Are there other methods (like System.out.println) the JIT is aware of that cannot be optimized away,

Yea ... anything that makes the object reachable.

or even better, is there some way to tell the JIT not to optimize away a method call / code block?

No way to do that ... apart from making sure that the method call or code block does something that contributes to the computation being performed.


UPDATE

First, what is going on here is not really JIT optimization. Rather, the JIT is emitting some kind of "map" that the GC is using to determine when local variables (i.e. variables on the stack) are dead ... depending on the program counter (PC).

Your examples to inhibit collection all involve blocking the JIT via SOUT, I'd like to avoid that somewhat hacky solution.

Hey ... ANYTHING that depends on the exact timing of when things are garbage collected is a hack. You are not supposed to do that in a properly engineered application.

I updated my code to make it clear that the list that's holding my objects is a static variable of the main class, but it seems if the JIT's smart enough it could still theoretically GC these values once it knows the main method doesn't need them.

I disagree. In practice, the JIT cannot determine that a static will never be referenced. Consider these cases:

  • Before the JIT runs, it appears that nothing will use static s again. After the JIT has run, the application loads a new class that refers to s. If the JIT "optimized" the s variable, the GC would treat it as unreachable, and either null it or create a dangling references. When the dynamically loaded class then looked at s it would then see the wrong value ... or worse.

  • If the application ... or any libraries used by the application ... uses reflection, then it can refer to the value of any static variable without this being detectable by the JIT.

So while it is theoretically possible to do this optimization is a small number of cases:

  • in the vast majority of cases, you can't, and
  • in the few cases that you can, the pay-off (in terms of performance improvement) is most likely negligible.

I similarly updated my code to clarify that I'm talking about the finalize method of the main class.

The finalize method of the main class is irrelevant because:

  • you are not creating an instance of the main class, and
  • the finalize method CANNOT refer to the local variables of another method (e.g. the main method).

... it's existence prevents the JIT from nuking my static list.

Not true. The static list can't be nuked anyway; see above.

As I understand it, there's something special about SOUT that the JIT is aware of that prevents it from optimizing such calls away.

There is nothing special about sout. It is just something that we KNOW that influences the results of the computation and that we therefore KNOW that the JIT cannot legally optimize away.

Upvotes: 1

Patricia Shanahan
Patricia Shanahan

Reputation: 26185

Here's a plan that may be overkill, but should be safe and reasonably simple:

  • Keep a List of references to the objects.
  • At the end, iterate over the list summing the hashCode() results.
  • Print the sum of the hash codes.

Printing the sum ensures that the final loop cannot be optimized out. The only thing you need to do for each object creation is put it in a List add call.

Upvotes: 1

Related Questions