Reputation: 1215
I need to store in memory a huge amount of data by looping over a result set (Oracle database).
I carried out some test and, by using a profiler, I noticed that there is a considerable difference between the heap size
and the used heap
(i.e. my data). Here an example.
I already saw the available JVM arguments in order to set the right heap size, but the problem is that I don't know in advance how many bytes data will occupy (since the amount of data can vary from one test to another).
By observing the graph in the image, the problem seems to be the memory "peaks" during the execution.
Could these peaks be related to the number of fetched rows (or in general to the extracted data?
Is there a way to avoid this effect, by keeping memory almost constant (so that the heap size doesn't increase excessively)?
Thanks
Upvotes: 1
Views: 1792
Reputation: 892
By looking at your memory chart, it seems much of the data is of a temporary nature and can be removed from heap at some point. The final ratio of used heap vs. its total size says it all.
It seems like the temporary data's (e.g. buffered data from an Oracle ResultSet
) time to live is too high or the eden space it too small and thus data is being moved from the eden and/or survivor space to the old generation space where it's being collected as a result of the JVM detecting the need to run the GC on the old generation space. This possibly happens when you iterate over your ResultSet
and the Oracle driver needs to fetch the next chunk of data from the database, which can be fairly large.
At this point I should go a little bit into detail about the Oracle ResultSet
buffer. It's basically just a chunk of bytes on the heap. Depending on the column data it is stored as something different than you'd read from the ResultSet
. Take a java.sql.Timestamp
for instance. Inside the buffer it's stored as an oracle.sql.TIMESTAMP or even just plain bytes. This means that whenever you extract a java.sql.Timestamp
from a ResultSet
there's the need for another object to be allocated. And this object is most likely the "final" object you want to keep eventually.
I suggest tuning the JVM's GC to your needs. Maybe you can figure out which data is being constantly collected. Try adjusting the eden size so the JVM doesn't need to promote too much to the old generation. You can also adjust how much new space the JVM allocates on demand and how it shrinks when detecting a gap in usage and allocated size.
You can find a list of JVM options here.
Upvotes: 1
Reputation: 718788
Could these peaks be related to the number of fetched rows (or in general to the extracted data?
I presume you are referring to the blue peaks.
The blue area represents the memory used at any given point in time, and the peaks represent points at which the garbage collector runs. As you can see, the line slopes up at an angle to each peak, and then falls vertically. This is normal behavior.
You will also notice that heights of the peaks and troughs are trending upward. This is most likely the effect of your application's in-memory data structure growing.
Is there a way to avoid this effect, by keeping memory almost constant (so that the heap size doesn't increase excessively)?
Basically, no there isn't. If the blue line wasn't jagged, or the peaks were shallower and closer together, that would mean that the GC is running more frequently ... which would be bad for performance.
Basically, if you are building a big data structure in memory, you need enough memory to represent it, PLUS a bunch of extra space for temporary objects and to give the garbage collector room to do what it needs to do.
If your concern is that your application is using too much memory overall, then you need to optimize the in-memory data structure you are building, and check to see you don't have any (other) memory leaks.
If your concern is that you can't predict how big the Java heap needs to be, then consider running the SQL query as a COUNT first, and then start / restart the Java application with a heap size estimate base on the count.
Upvotes: 1
Reputation: 3701
Of course you can limit the memory, but there is not much benefit in doing this. If you do so, garbage collection will have to happen more often, which will result in a slower execution of your program.
This is simply the way garbage collection works in Java. If you have enough memory GC will not be called. That gives your application more resources (CPU time).
Also, to optimize memory consumption you should check your algorithms and see if you can reuse some objects instead of creating new ones, because the new objects is exactly what makes the blue line go up. See fly weight
and other similar patterns which are used to control memory consumption.
Upvotes: 1