Reputation: 2523
I am working on a fractal rendering software. The basic setup is that I have a big 2-dimensional array (picture), where values are incremented.
The simple rendering process is
while( iteration < maxIteration ) {
update some pixel in array;
}
This is stupidly simple to parallellize; just have several threads to do this simultaneously, since each thread will (very likely) work with different pixels at the same time, and even if there is an update collision in the array, this is fine. The array is shared among the threads!
However, to keep track of the total number of iteratins done, I need iteration
to be volatile, which I suspect slows down the code a little.
What baffels me is that I get virtually the same speed for 4 threads and 16 threads,
and I run this on a 64-core machine, which is verified by Runtime.getRuntime().availableProcessors()
.
One issue is that I have no control over where in the array the threads work, hence, the issue might be a big case cache misses? The array is of the size of a fullhd-image: 1920x1080x4 longs.
Thus, I seek possible issues, and solutions to them, since I think this might be a common type of problem.
Edit: The code I am trying to optimize is available here (sourceforge).
The class ThreadComputator
represents one thread, and all these do iterations.
The number of iterations done is stored in the shared variable currentIteration
,
which (in the current code) is incremented in a synchronized block.
All threads write to the Histogram
object, which essentially is a big array of doubles.
Writing to this does not need to be atomic, as overwrites will be rare, and the error is tolerated.
Upvotes: 0
Views: 264
Reputation: 70909
I think you've answered your own question.
Because I implement the chaos game algorithm. This means that the next pixel
I need to work on depends non-deterministically on current pixel.
And you have a memory system on your computer that is functionally random access; but, the fastest performance is only possible if you have localized (within the cache pages) reads and writes.
I'd re-implement your algorithm like so:
Yes, it won't be 100% random anymore; however you can mitigate that by counting the "write time" and assuming that all writes in the same write time occurred simultaneously. It will still thrash your memory pretty badly, but at least it will thrash is somewhat less.
Upvotes: 2