Reputation: 1324

Why is parallel stream slower?

I was playing around with infinite streams and made this program for benchmarking. Basically the bigger the number you provide, the faster it will finish. However, I was amazed to find that using a parellel stream resulted in exponentially worse performance compared to a sequential stream. Intuitively, one would expect an infinite stream of random numbers to be generated and evaluated much faster in a multi-threaded environment, but this appears not to be the case. Why is this?

    final int target = Integer.parseInt(args[0]);
    if (target <= 0) {
        System.err.println("Target must be between 1 and 2147483647");
        return;
    }

    final long startTime, endTime;
    startTime = System.currentTimeMillis();

    System.out.println(
        IntStream.generate(() -> new Double(Math.random()*2147483647).intValue())
        //.parallel()
        .filter(i -> i <= target)
        .findFirst()
        .getAsInt()
    );

    endTime = System.currentTimeMillis();
    System.out.println("Execution time: "+(endTime-startTime)+" ms");

Upvotes: 7

Answers (3)

Sina Madani

Reputation: 1324

Following suggestions from various answers, I think I've fixed it. I'm not sure what the exact bottleneck was but on an i5-4590T the parallel version with the following code performs faster than the sequential variant. For brevity, I've included only the relevant parts of the (refactored) code:

static IntStream getComputation() {
    return IntStream
            .generate(() -> ThreadLocalRandom.current().nextInt(2147483647));
}

static void computeSequential(int target) {
    for (int loop = 0; loop < target; loop++) {
        final int result = getComputation()
                    .filter(i -> i <= target)
                    .findAny()
                    .getAsInt();
        System.out.println(result);
    }
}

static void computeParallel(int target) {
     IntStream.range(0, target)
                .parallel()
                .forEach(loop -> {
                    final int result = getComputation()
                        .parallel()
                        .filter(i -> i <= target)
                        .findAny()
                        .getAsInt();
                    System.out.println(result);
                });
}

EDIT: I should also note that I put it all in a loop to get longer running times.

Upvotes: 0

Lachezar Balev

Reputation: 12051

I totally agree with the other comments and answers but indeed your test behaves strange in case that the target is very low. On my modest laptop the parallel version is on average about 60x slower when very low targets are given. This extreme difference cannot be explained by the overhead of the parallelization in the stream APIs so I was also amazed :-). IMO the culprit lies here:

Math.random()

Internally this call relies on a global instance of java.util.Random. In the documentation of Random it is written:

Instances of java.util.Random are threadsafe. However, the concurrent use of the same java.util.Random instance across threads may encounter contention and consequent poor performance. Consider instead using ThreadLocalRandom in multithreaded designs.

So I think that the really poor performance of the parallel execution compared to the sequential one is explained by the thread contention in random rather than any other overheads. If you use ThreadLocalRandom instead (as recommended in the documentation) then the performance difference will not be so dramatic. Another option would be to implement a more advanced number supplier.

Upvotes: 10

Peter Lawrey

Reputation: 533880

The cost of passing work to multiple thread is expensive es the first time you do it. This cost is fairly fixed so even if your task is trivial, the overhead is relatively high.

One of the problems you have is that highly inefficient code is a very poor way to determine how well a solution performs. Also, how it runs the first time and how it runs after a few seconds can often be 100x different (can be much more) I suggest using an example which is already optimal and only then attempt to use multiple threads.

e.g.

long start = System.nanoTime();
int value = (int) (Math.random() * (target+1L));
long time = System.nanoTime() - value;
// don't time IO as it is sooo much slower
System.out.println(value);

Note: this will not be efficient until the code has warmuped and been compiled. i.e. ignore the first 2-5 seconds this code is run.

Upvotes: 0

Why is parallel stream slower?

Answers (3)

Related Questions