Reputation: 15331
I have a very weird problem with GC in Java. I am running th following piece of code:
while(some condition){
//do a lot of work...
logger.info("Generating resulting time series...");
Collection<MetricTimeSeries> allSeries = manager.getTimeSeries();
logger.info(String.format("Generated %,d time series! Storing in files now...", allSeries.size()));
//for (MetricTimeSeries series : allSeries) {
// just empty loop
//}
}
When I look into JConsole, at the restart of every loop iteration, my old gen heap space, if I manually force GC, takes a size of about 90 MB. If I uncomment the loop, like this
while(some condition){
//do a lot of work...
logger.info("Generating resulting time series...");
Collection<MetricTimeSeries> allSeries = manager.getTimeSeries();
logger.info(String.format("Generated %,d time series! Storing in files now...", allSeries.size()));
for (MetricTimeSeries series : allSeries) {
// just empty loop
}
}
Even if I force it to refresh, it won't fall below 550MB. According to yourKit profiler, the TimeSeries objects are accessible via main thread's local var (the collection), just after the GC at the restart of a new iteration... And the collection is huge (250K time series.)... Wyy is this happening and how can I "fight" this (incorrect?) behaviour?
Upvotes: 3
Views: 1872
Reputation: 11705
Since you're building a (large) ArrayList
of time series, it will occupy the heap as long as it's referenced, and will get promoted to old if it stays long enough (or if the young generation is too small to actually hold it). I'm not sure how you're associating the information you're seeing in JConsole or Yourkit to a specific point in the program, but until the empty loop is optimized by several JIT passes, your while
loop will take longer and keep the collection longer, which might explain the perceived difference while there's actually not a lot.
There's nothing incorrect about that behaviour. If you don't want to consume so much memory, you need to change your Collection
so it's not an eagerly-filled ArrayList
, but a lazy collection, more of a stream (if you've ever done XML processing, think DOM vs SAX) which gets evaluated as it's iterated. If you don't need the whole collection to be sorted, that's doable, especially since you seem to be saying that the collection is a concatenation of sub-collections returned by underlying objects.
If you can change your return type from Collection
to Iterable
, you could for example use Guava's FluentIterable.transformAndConcat()
to transform the collection of underlying objects to a lazily-evaluated Iterable
concatenation of their time series. Of course, the size of the collection is not directly available anymore (and if you try to get it independently of the iteration, you'll evaluate the lazy collection twice).
Upvotes: 1
Reputation: 5803
Yup, the garbage collector can be mysterious.. but it beats managing your own memory ;)
Collections and Maps have a way of hanging onto references longer than you might like and thus preventing garbage collection when you might expect. As you noticed, setting the allSeries
reference to null
itself will ear mark it for garbage collection, and thus it's contents are up for grabs as well. Another way would be to call allSeries.clear()
: this will unlink all it's MetricTimeSeries
objects and they will be free for garbage collection.
Why does removing the loop get around this problem also? This is the more interesting question. I'm tempted to suggest the compiler is optimizing the reference to allSeries
.. but you are still calling allSeries.size()
so it can't completely optimize out the reference.
To muddy the waters, different compiles (and settings) behave differently and use different garbage collectors which themselves behave differently. It's tough to say exactly what's happening under the hood without more information.
Upvotes: 2