Reputation: 424
I create a fixed threadpool using forPool = Executors.newFixedThreadPool(poolSize);
where poolSize is initialized to the number of cores on the processor (lets say 4). In some runs, it works fine and the CPU utilisation is consistently at 400%.
But sometimes, the usage drops to 100%, and never rises back to 400%. I have 1000s of tasks scheduled, so the problem is not that. I catch every exception, but no exception is thrown. So the issue is random and not reproducible, but very much present. They are data parallel operations. At the end of each thread, there is a synchronised access to update a single variable. Highly unlikely I have a deadlock there. In fact, once I spot this issue, if I destroy the pool, and create a fresh one of size 4, it is still only 100% usage. There is no I/O.
It seems counter intuitive to java's assurance of a "FixedThreadPool". Am I reading the guarantee wrong? Is only concurrency guaranteed and not parallelism?
And to the question - Have you come across this issue and solved it? If I want parallelism, am I doing the correct thing?
Thanks!
On doing a thread dump: I find that there are 4 threads all doing their parallel operations. But the usage is still ~100% only. Here are the thread dumps at 400% usage and 100% usage. I set the number of threads to 16 to trigger the scenario. It runs at 400% for a while, and then drops to 100%. When I use 4 threads, it runs on 400% and only rarely drops to 100%. This is the parallelization code.
****** [MAJOR UPDATE] ******
It turns out that if I give the JVM a huge amount of memory to play with, this issue is solved and the performance does not drop. But I don't know how to use this information to solve this issue. Help!
Upvotes: 3
Views: 1800
Reputation: 3549
Increasing the size of the Java heap usually improves throughput until the heap no longer resides in physical memory. When the heap size exceeds the physical memory, the heap begins swapping to disk which causes Java performance to drastically decrease. Therefore, it is important to set the maximum heap size to a value that allows the heap to be contained within physical memory.
Since you give the JVM ~90% of physical memory on the machines, problem may be related to IO happening due to memory paging and swapping when you try to allocate memory for more objects. Note that the physical memory is also used by other running processes as well as OS. Also since symptoms occur after a while, this is also indication for memory leaks.
Try to find out how much physical memory is available (not already used) and allocate ~90% of available physical memory to your JVM heap.
What happens if you leave the system running for extended period of time?
Does it ever comes back at CPU 400% of utilization?
Take a look at following link for tuning: http://java.sun.com/performance/reference/whitepapers/tuning.html#section4
Upvotes: 0
Reputation: 8274
This is almost certainly due to GC.
If you want to be sure add the following startup flags to your Java program:
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
and check stdout.
You will see lines containing "Full GC" including the time this took: during this time you will see 100% CPU usage.
The default garbage collector on multi-CPU or multi-core machines is the throughput collector, which collects the young generation in parallel but uses serial collection (in one thread) for the old generation.
So what is probably happening is that in your 100% CPU example, GC is going on of the old generation which is done in one thread and so keeps one core busy only.
Suggestion for solution: use the concurrent mark-and-sweep collector, by using the flag
-XX:+UseConcMarkSweepGC
at JVM startup.
Upvotes: 1
Reputation: 3417
My answer is based on a mixture of knowledge about JVM memory management and some guesses about facts which I could not find precise information on. I believe that your problem is related to the thread-local allocation buffers (TLABs) Java uses:
A Thread Local Allocation Buffer (TLAB) is a region of Eden that is used for allocation by a single thread. It enables a thread to do object allocation using thread local top and limit pointers, which is faster than doing an atomic operation on a top pointer that is shared across threads.
Let's say you have an eden size of 2M and use 4 threads: The JVM may choose a TLAB size of (eden/64)=32K and each thread gets a TLAB of that size. Once the 32K TLAB of a thread are exhausted, it needs to acquire a new one, which requires global synchronization. Global synchronization is also needed for allocation of objects which are larger than the TLAB.
But, to be honest with you, things are not as easy as I described: The JVM adaptively sizes a thread's TLAB based on its estimated allocation rate determined at minor GCs [1] which makes TLAB-related behavior even less predictable. However, I can imagine that the JVM scales the TLAB sizes down when more threads are working. This seems to make sense, because the sum of all TLABs must be less than the available eden space (and even some fraction of the eden space in practice to be able to refill the TLABs).
Let us assume a fixed TLAB size per thread of (eden size / (16 * user threads working)):
You can imagine that 16 threads which exhaust their TLAB faster because it's smaller will cause much more locks on the TLAB allocator than 4 threads with 32K TLABs.
To conclude, when you decrease the number of working threads or increase the memory available to the JVM, the threads can be given larger TLABs and the problem is solved.
https://blogs.oracle.com/daviddetlefs/entry/tlab_sizing_an_annoying_little
Upvotes: 2
Reputation: 6901
Given the fact that increasing your heap size makes the problem 'go away' (perhaps not permanently), the issue is probably related to GC.
Is it possible that the Operation implementation is generating some state, that is stored on the heap, between calls to
pOperation.perform(...);
? If so, then you might have a memory usage problem, perhaps a leak. As more tasks complete, more data is on the heap. The garbage collector has to work harder and harder to try and reclaim as much as it can, gradually taking up 75% of your total available CPU resources. Even destroying the ThreadPool won't help, because that's not where the references are stored, it's in the Operation.
The 16 thread case hitting this problem more could be due to the fact that it's generating more state quicker (don't know the Operation implementation, so hard for me to say).
And increasing the heap size while keeping the problem set the same would make this problem appear to disappear, because you'd have more room for all this state.
Upvotes: 5
Reputation: 3508
Since you are using locking, it is possible that one of your four threads attains the lock but is then context switched - perhaps to run the GC thread. The other threads can't make progress since they can't attain the lock. When the thread context switches back, it completes the work in the critical section and relinquishes the lock to allow only one other thread to attain the lock. So now you have two threads active. It is possible that while the second thread executes the critical section the first thread does the next piece of data parallel work but generates enough garbage to trigger the GC and we're back where we started :)
P.S. This is just a best guess since it is hard to figure out what is happenning without any code snippets.
Upvotes: 0
Reputation: 533442
A total cpu utilisation at a 100% implied that you have written is single threaded. i.e. you may have any number of concurrent tasks, but due to locking, only one can execute at a time.
If you have high IO you can get less than 400% but it is unlikely you will get a round number of cpu utilisation. e.g. you might see 38%, 259%, 72%, 9% etc. (It is also likely to jump around)
A common problem is locking the data you are using too often. You need to consider how it could be re-written where locking is performed for the briefest period and smallest portion of the overall work. Ideally, you want to avoid locking all together.
Using multiple thread means you can use up to that many cpus, but if your code prevents it you are likely to be better off (i.e. faster) to write the code single threaded as it avoids the overhead of locking.
Upvotes: 0
Reputation: 4228
I'll suggest that you use the Yourkit Thread Analysis feature to understand the real behavior. It will tell you exactly which threads are running, blocked or waiting and why.
If you can't/don't want to purchase it, next best option is to use Visual VM, which is bundled with the JDK to do this analysis. It won't give you as detailed information as Yourkit. Following blog post can get you started with Visual VM: http://marxsoftware.blogspot.in/2009/06/thread-analysis-with-visualvm.html
Upvotes: 2