Reputation: 936
I'm struggling with getting the right settings for my JVM.
Here's the use case: Tomcat is serving requests (300req/s). But they are very fast (key-value lookup) so I don't have any performance problems. Everything would work fine till I have to refresh the data it's serving every 3 hours. You can imagine I have a big HashMap and I'm just doing lookups. During data reload a create a temporary HashMap and then I swap it. I need to load quite a lot of data (~800MB in memory every time).
The problem I have that during those loads from time to time Tomcat stops responding. Initially the problem was promotion failures and FullGC but I got around those problems by tweaking the settings.
As you might notice I already decreased the value when the CMS collector kicks in. I don't get any promotion failure or anything like that any more. The young generation is reasonably small to make the minor collection fast. I've increased the SurvivorRatio because all the request objects die young and what doesn't should be automatically promoted to old generation.(the data being load).
But I'm still seeing 503 errors in Tomcat during the data load. In gc.log my minor collections started to be slow during this process. They are now in seconds comparing to miliseconds. I've tried slowing down the load process to give a breather to the GC but I doesn't seem to work... The problem is especially problematic the moment I reach the capacity of old generation. CMS kicks in, frees up memory and then later the allocations are pretty slow. I don't see any errors in gc.log any more. What can I do differently? I know fragmentation might be a problem but I'm not getting promotion failures. The machine is a 8 core server. Does decreasing the number of GCThread make any sense? Will setting a lower thread priority for the data loading thread make sense?
Is there a way to kick off CMS collector periodically in the background? The data that's being swapped can be actually immediately be garbage collected.
I'm open to any suggestions!
Here are my JVM settings.
-Xms14g
-Xmx14g
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+AlwaysPreTouch
-XX:MaxNewSize=256m
-XX:NewSize=256m
-XX:MaxPermSize=128m
-XX:PermSize=128m
-XX:SurvivorRatio=24
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=88
-XX:+UseCompressedStrings
-XX:+DisableExplicitGC
JDK 1.6.33 Tomcat 6
gc.log snippet:
line 7 the data load starts
line 20 it stops
Upvotes: 6
Views: 498
Reputation: 8476
your load lasts about 90s and is interrupted by a GC every 1s or so yet you have a 14G heap which has a steady state occupancy (assuming the surrounding log lines are steady state) of only about 5G which means you have a lot of memory going to waste. I think the previous answer looks to be correct (based on the data presented) when it says your survivor spaces are too small. If it reasonable does nothing but lookups the rest of the time then a perfectly reasonable strategy would be something like
The point here being to try and completely avoid a young collection during the load phase. However a tenured threshold of 0 would mean the previous version would likely be in tenured and you'd eventually see a possibly lengthy collection to clean it up. Another option might be to go the other way round and have tenured big enough to fit 2-3 versions of the data and eden the rest with a view to attempting to minimise the frequency of a young collection and help tenured be collected as quickly as possible.
What works best really depends on what else the app is doing the rest of the time.
The cms trigger seems quite high for a large heap btw, if you only start collecting at 88% then does it have time to finish the job before a fullgc is forced? I suppose it might be quite safe if you're actually doing v little allocation most of the time.
Upvotes: 1
Reputation: 13696
Looking at that attached log and seeing those huge increases in minor GC times leads me to belive that your machine is under extremely heavy load from other processes than the JVM.
My reasoning in this is that when your minor GC is taking place, all application threads are stopped. Hence, nothing your application does should be able to affect the minor GC times seeing that your new gen is constant in size.
However, if there are a lot of load from other processes on the machine during this time, the GC threads will compete for execution time and you could see this behavior.
Could you check the CPU usage from other processes when your data load is running?
Edit: Looking a bit more on the logs I come up with another possible explanation.
It seems that the target survivor space is full (ParNew goes down to exactly 10048K each "slow" GC). That would mean that objects are promoted to old gen directly which possibly could slow this down. I would try to increase the size of the New gen and lower the survivor ratio. Even maybe try to run without setting the new gen size or the survivor rate at all and see how the JVM managed to optimize this (although beware that the JVM usually does a poor job for optimizing for bursts like this).
Upvotes: 2