I'm trying to identify a class of Java applications that could benefit from use of parallelStream API introduced in Java 8. I'm aware of the numerous caveats of the API described in other SO posts : Shared fork/join pool, with non trivial starting time, and some potential issues with contention in the pool Uncontrolled use of system resources in a way that makes using this sort of code on a server (that already has a multi-task policy) might actually be a bad idea ... there are other criticisms mostly related to performance Still, the API offers to make use of modern multicore machines with code that is not very intrusive provided Stream API is already used, so no hassle multi-threading at low development cost. I would therefore still like to think it can be useful in some scenarios. I'm thinking the application context thus has to be something like : my application is currently sequential there is a response time issue, in terms of wall clock time, e.g. the user clicked a GUI button and is waiting for reply the application is running on client machines, where most of the time we can expect to have some available CPU cores, not on a server where resources are already contended my development team does not have the manpower/skills to develop their own task allocation/threading mechanism, so they would not go for parallelism unless they can do it easily using this API I searched on github, but it's quite hard to find relevant examples of parallelStream usage that are not exercises or textbook examples (I'd welcome links to some usage in midsize+ projects of the API). So which kind of applications were the Java language developers targetting with this API ? Would you agree with the above requirements on the application context for the API to be useful ?

javamultithreadingparallel-processingjava-stream

Reputation: 2075

In what application niche is parallelStream from Java useful?

I'm trying to identify a class of Java applications that could benefit from use of parallelStream API introduced in Java 8.

I'm aware of the numerous caveats of the API described in other SO posts :

Shared fork/join pool, with non trivial starting time, and some potential issues with contention in the pool
Uncontrolled use of system resources in a way that makes using this sort of code on a server (that already has a multi-task policy) might actually be a bad idea
... there are other criticisms mostly related to performance

Still, the API offers to make use of modern multicore machines with code that is not very intrusive provided Stream API is already used, so no hassle multi-threading at low development cost. I would therefore still like to think it can be useful in some scenarios.

I'm thinking the application context thus has to be something like :

my application is currently sequential
there is a response time issue, in terms of wall clock time, e.g. the user clicked a GUI button and is waiting for reply
the application is running on client machines, where most of the time we can expect to have some available CPU cores, not on a server where resources are already contended
my development team does not have the manpower/skills to develop their own task allocation/threading mechanism, so they would not go for parallelism unless they can do it easily using this API

I searched on github, but it's quite hard to find relevant examples of parallelStream usage that are not exercises or textbook examples (I'd welcome links to some usage in midsize+ projects of the API).

So which kind of applications were the Java language developers targetting with this API ?

Would you agree with the above requirements on the application context for the API to be useful ?

Upvotes: 7

Answers (4)

jaco0646

Reputation: 17066

A similar question is asked in Should I always use a parallel stream when possible? Note the second answer is given by Brian Goetz, a Java language architect at Oracle who was involved in the design of the Stream API, so his answer may be considered authoritative.

Top answers are quick to point out that parallel streams include additional overhead necessary for coordination and thus will only increase performance in scenarios where the amount of individual processing per stream is significant enough that the gain from parallel processing overcomes that initial overhead.

Unsurprisingly, as with any question of performance, the advice is to measure rather than guess. Start with a sequential stream, and if you have a large number of elements each requiring complex computation, measure the performance difference of switching to parallel streams.

Additional guidelines, such as those listed in the OP, may be helpful; but people are notoriously bad at identifying performance bottlenecks, so any guidelines are likely to fail eventually in the face of actual measurements.

Upvotes: 3

Fahim Bagar

Reputation: 828

In my experience, I use parallelStream to breakdown function that has to be called thousand times but every output doesn't have impact on each other.

This is from my example code to answer stackoverflow question. So there is a function that need to be called to measure ecludian point based on csv row

public class Euclidian {

    public Double[] euclidian(Double[][] data) {

        Double[] result = new Double[data.length - 1];
        for (int i = 0; i < result.length; i++) {
            result[i] =
                    Math.pow(data[i][0] - data[data.length - 1][0], 2) +
                            Math.pow(data[i][1] - data[data.length - 1][1], 2);
        }

        return result;
    }
}

Because every row on csv must be calculated based on order but every output doesn't have to be joined except the data is taken at the output according to the order, I enhance it using parallelStream

IntStream
.range(1, data.length - 1)
.parallel()
.forEach(i -> {
    add(euclidian.euclidian(Arrays.copyOf(data, i + 1)), i);
});

And from it, I have tested against 1049 lines of csv file. The result is exponentially faster (bigger the input csv, faster it will be) when the parallelStream method is compared to original code using looping

        for (int i = 0; i < distanceTable.length - 1; ++i) {
            distanceTable[i] = new Double[i + 1];
            for (int j = 0; j <= i; ++j) {
                double distance = 0.0;
                for (int k = 0; k < DataSet[i + 1].length; ++k) {
                    double difference = Double.parseDouble(DataSet[j][k]) - Double.parseDouble(DataSet[i + 1][k]);
                    distance += difference * difference;
                }
                distanceTable[i][j] = distance;
            }
        }

You may check my git project

Upvotes: 0

user3453226

Reputation:

the application is running on client machines, where most of the time we can expect to have some available CPU cores, not on a server where resources are already contended

This prediction does not have any foundation. Both on desktop and server machines, there could be only your application running or there could be 1,000s of applications running.

There is no "application niche" in which parallel streams are useful. You should use them only if you make sure, either via quantitative or qualitative measuring, that performance is improved, and their disadvantages do not matter too much.

They are easy only if you understand the concepts beneath. They can be applied only to a specific subset of problems.

I would consider using them only if:

all stream operations are pure functions and therefore do not require synchronization
the performance is not critical, however a boost would be great (so contention of the shared pool can be tolerated)

Upvotes: 1

aballaci

Reputation: 1083

This looks like a nice explanation of cases of where and why. https://computing.llnl.gov/tutorials/parallel_comp/#WhyUse I personally see no interesting cases in user centered web applications.

The fork/join Framework is a really cool low level api. Many other higher level frameworks use it under the hood very successfuly. I've used it for test data generation. Cache bootstraping. Data processing etc... In many cases you get a really good boost of performance in others its just unnecessary overhead.

Upvotes: 3

In what application niche is parallelStream from Java useful?

Answers (4)

Related Questions