the disappeared scalability

Question

I made a toy program to test Java's concurrency performance. I put it here: https://docs.google.com/open?id=0B4e6u_s5iHT6MTNkZGM5ODQtNjZmYi00NTMwLWJlMjUtYzViOWZlMDM5NGVi

It accepts an integer number as the argument that indicates how many threads to use. The program just figures out prime numbers from a range. A generic version is obtained by commenting line 44~53, and it generates nearly perfect scalability.

However, when I uncommenting line 44~53, which does simple computation locally, and adjust the variable s to a value big enough, scalability may disappear.

My question is whether my toy program uses shared data which may result in degraded concurrency performance. And how to explain the disappeared scalability (I think low level overhead, like garbage collection, causes that)? Any solution can solve problems like this case?

aroth · Accepted Answer

The code in question is:

int s = 32343;
ArrayList al = new ArrayList(s);
for (int c = 0; c < s; c++) {
    al.add(c);
}
Iterator it = al.iterator();
if (it.hasNext()) {
    int c = it.next();
    c = c++;
}

Of course this will degrade performance if you increase the value of s, since s controls how many things you put into the list. But that has very little to do with concurrency or scalability. If you write code telling the computer to waste time doing thousands or millions of throw-away computations, then of course your performance will degrade.

In more technical terms, the time-complexity of this section of code is O(2n) (it takes n operations to build the list, and then n operations to iterate it and increment each value), where n is equal to s. So the bigger you make s, the longer it will take to execute this code.

In terms of why this would seem to make the benefits of concurrency smaller, have you considered the memory implications as s becomes larger? For instance, are you sure the Java heap is large enough to hold everything in memory without anything getting swapped out to disk? And even if nothing is getting swapped out, by making the length of the ArrayList larger you are giving the garbage collector more work to do when it runs (and possibly increasing the frequency at which it runs). Note that depending upon the implementation, the garbage collector may be pausing all of your threads each time it runs.

I wonder, if you allocate a single ArrayList instance per thread, at the time the thread is created, and then reuse that in the call to isPrime() instead of creating a new list each time, does that improve things?

Edit: Here's a fixed up version: http://pastebin.com/6vR7Uhez

It gives the following output on my machine:

------------------start------------------
1 threads' runtimes:
1       3766.0
maximum:    3766.0
main time:  3766.0
------------------end------------------
------------------start------------------
2 threads' runtimes:
1       897.0
2       2483.0
maximum:    2483.0
main time:  2483.0
------------------end------------------
------------------start------------------
4 threads' runtimes:
1       576.0
2       1473.0
3       568.0
4       1569.0
maximum:    1569.0
main time:  1569.0
------------------end------------------
------------------start------------------
8 threads' runtimes:
1       389.0
2       965.0
3       396.0
4       956.0
5       398.0
6       976.0
7       386.0
8       933.0
maximum:    976.0
main time:  978.0
------------------end------------------

...which shows nearly linear scaling as the number of threads is ramped up. The problems that I fixed were a combination of points raised above and in John Vint's (now deleted) answer, as well as incorrect/unnecessary use of ConcurrentLinkedQueue structures and some questionable timing logic.

If we enable GC logging and profile both versions, we can see that the original version spends about 10x as much time running garbage-collection than the modified version:

Original:  [ParNew: 17401K->750K(19136K), 0.0040010 secs] 38915K->22264K(172188K), 0.0040227 secs]
Modified:  [ParNew: 17024K->0K(19136K),   0.0002879 secs] 28180K->11156K(83008K),  0.0003094 secs]

Which implies to me that between the constant list allocations and Integer autoboxing, the original implementation was simply churning through too many objects, which places too much load on the GC, which degraded the performance of your threads to the point where there was no benefit (or even a negative benefit) to creating more threads.

So all this says to me is that if you want to get good scaling out of concurrency in Java, whether your task is large or small, you have to pay attention to how you are using memory, be aware of potentially hidden pitfalls and inefficiencies, and optimize away the inefficient bits.

the disappeared scalability

Answers (1)

Related Questions