bmargulies
bmargulies

Reputation: 100123

Tracking down thread conflicts in java

Using YourKit, I metered an application, and identified the main CPU sink. I structured the computation to parallelize this via an ExecutorService with a fixed number of threads.

On a 24-core machine, the benefit of adding threads trails off very fast above 4. So, thought I, there must be some contention or locking going on around here, or IO latency, or something.

OK, I turned on the 'Monitor Usage' feature of YourKit, and the amount of blocked time shown in the worker threads is trivial. Eyeballing the thread state chart, the worker threads are nearly all 'green' (running) as opposed to yellow (waiting) or red (blocked).

CPU profiling still shows 96% of the time in a call tree that is inside the worker threads.

So something is using up real time. Could it be scheduling overhead?

In pseudo-code, you might model this as:

loop over blobs:
    submit tasks for a blob via invokeAll of executor
    do some single-threaded processing on the results
end loop over blobs

In a test run, there are ~680 blobs, and ~13 tasks/blob. So each thread (assuming four) dispatches about 3 times per blob.

hardware: I've run tests on a small scale on my MacBook pro, and then on a big fat Dell: hwinfo on linux there reports 24 different items for --cpu, composed of

Intel(R) Xeon(R) CPU           X5680  @ 3.33GHz

Intel's website tells me that each has 6 cores, 12 threads, I suspect I have 4 of them.

Upvotes: 3

Views: 321

Answers (3)

Peter Lawrey
Peter Lawrey

Reputation: 533660

Assuming you have 4 cores with 8 logical threads each, this means you have 4 real processing unit which can be shared across 32 threads. It also means when you have 2-8 active threads on the same core, they have to compete for resources such as the CPU pipeline and the instruction and data caches.

This works best when you have many threads which have to wait for external resources like disk or network IO. If you have CPU intensive processes, you may find that one thread per core will use all the CPU power you have.

I have written a library which supports allocations of threads and cores for linux and windows. If you have Solaris it may be easy to port as it support JNI posix calls and JNA calls.

https://github.com/peter-lawrey/Java-Thread-Affinity

Upvotes: 2

srini.venigalla
srini.venigalla

Reputation: 5145

You have not completely parallelized the processing. you may not be submitting the next blob till the results of the previous blob is completed, hence no parallel processing.

If you can, try this way:

for each blob{

        create a runnable for blob process name it blobProcessor;
        create a runnable for blob results name it resultsProcessor;
        submit blobProcessor;
               before blobProcessor finishes, submit resultsProcessor;
}

also:

please take a look at JetLang which provides a threadless concurrency using fibers.

Upvotes: 0

artbristol
artbristol

Reputation: 32427

It's most likely not contention, though it's hard to say without more details. Profiling results can be misleading because Java reports threads as RUNNABLE when they're blocked on disk or network I/O. Yourkit still counts it as CPU time.

Your best bet is to turn on CPU profiling and drill into what's taking the time in the worker threads. If it ends up mostly in java.io classes, you've still got disk or network latency.

Upvotes: 1

Related Questions