Reputation: 2075
I'm trying to identify a class of Java applications that could benefit from use of parallelStream API introduced in Java 8.
I'm aware of the numerous caveats of the API described in other SO posts :
Still, the API offers to make use of modern multicore machines with code that is not very intrusive provided Stream API is already used, so no hassle multi-threading at low development cost. I would therefore still like to think it can be useful in some scenarios.
I'm thinking the application context thus has to be something like :
I searched on github, but it's quite hard to find relevant examples of parallelStream usage that are not exercises or textbook examples (I'd welcome links to some usage in midsize+ projects of the API).
So which kind of applications were the Java language developers targetting with this API ?
Would you agree with the above requirements on the application context for the API to be useful ?
Upvotes: 7
Views: 828
Reputation: 17066
A similar question is asked in Should I always use a parallel stream when possible? Note the second answer is given by Brian Goetz, a Java language architect at Oracle who was involved in the design of the Stream API, so his answer may be considered authoritative.
Top answers are quick to point out that parallel streams include additional overhead necessary for coordination and thus will only increase performance in scenarios where the amount of individual processing per stream is significant enough that the gain from parallel processing overcomes that initial overhead.
Unsurprisingly, as with any question of performance, the advice is to measure rather than guess. Start with a sequential stream, and if you have a large number of elements each requiring complex computation, measure the performance difference of switching to parallel streams.
Additional guidelines, such as those listed in the OP, may be helpful; but people are notoriously bad at identifying performance bottlenecks, so any guidelines are likely to fail eventually in the face of actual measurements.
Upvotes: 3
Reputation: 828
In my experience, I use parallelStream to breakdown function that has to be called thousand times but every output doesn't have impact on each other.
This is from my example code to answer stackoverflow question. So there is a function that need to be called to measure ecludian point based on csv row
public class Euclidian {
public Double[] euclidian(Double[][] data) {
Double[] result = new Double[data.length - 1];
for (int i = 0; i < result.length; i++) {
result[i] =
Math.pow(data[i][0] - data[data.length - 1][0], 2) +
Math.pow(data[i][1] - data[data.length - 1][1], 2);
}
return result;
}
}
Because every row on csv must be calculated based on order but every output doesn't have to be joined except the data is taken at the output according to the order, I enhance it using parallelStream
IntStream
.range(1, data.length - 1)
.parallel()
.forEach(i -> {
add(euclidian.euclidian(Arrays.copyOf(data, i + 1)), i);
});
And from it, I have tested against 1049 lines of csv file. The result is exponentially faster (bigger the input csv, faster it will be) when the parallelStream method is compared to original code using looping
for (int i = 0; i < distanceTable.length - 1; ++i) {
distanceTable[i] = new Double[i + 1];
for (int j = 0; j <= i; ++j) {
double distance = 0.0;
for (int k = 0; k < DataSet[i + 1].length; ++k) {
double difference = Double.parseDouble(DataSet[j][k]) - Double.parseDouble(DataSet[i + 1][k]);
distance += difference * difference;
}
distanceTable[i][j] = distance;
}
}
You may check my git project
Upvotes: 0
Reputation:
the application is running on client machines, where most of the time we can expect to have some available CPU cores, not on a server where resources are already contended
This prediction does not have any foundation. Both on desktop and server machines, there could be only your application running or there could be 1,000s of applications running.
There is no "application niche" in which parallel streams are useful. You should use them only if you make sure, either via quantitative or qualitative measuring, that performance is improved, and their disadvantages do not matter too much.
They are easy only if you understand the concepts beneath. They can be applied only to a specific subset of problems.
I would consider using them only if:
Upvotes: 1
Reputation: 1083
This looks like a nice explanation of cases of where and why. https://computing.llnl.gov/tutorials/parallel_comp/#WhyUse I personally see no interesting cases in user centered web applications.
The fork/join Framework is a really cool low level api. Many other higher level frameworks use it under the hood very successfuly. I've used it for test data generation. Cache bootstraping. Data processing etc... In many cases you get a really good boost of performance in others its just unnecessary overhead.
Upvotes: 3