Reputation: 1169

Java 8 Parallel processing and Lambda expression

I want to test how much faster Java8 parallel stream so I wrote a program .The program count the number of prime numbers in a given list of numbers. The program counts prime numbers in there ways:

using for loop;
by using lambda expression;
by using lambda expression(parallel stream).

Before executing the program I was expecting that parallel stream version should be faster. But the result is

Total prime no found 664579 in 4237 mili sec ----for loop version
Total prime no found 664579 in 2440 mili sec ----parallel stream
Total prime no found 664579 in 2166 mili sec ----lambda expression

My doubt is why parallel version is slower then lambda version

List<Integer> numbers = new ArrayList<>();
        for (int i = 0; i < 10000000; i++) {
            numbers.add(i);
        }
        Stopwatch stopwatch = Stopwatch.createStarted();
        int counter = 0;
        for (int number : numbers) {
            if (isPrime(number)) {
                counter++;
            }
        }
        stopwatch.stop();
        System.out.println("Total prime no found " + counter + " in " + stopwatch.elapsed(TimeUnit.MILLISECONDS) + " mili sec");

        stopwatch = Stopwatch.createStarted();
        long count1 = numbers.parallelStream().filter(n -> isPrime(n)).count();
        stopwatch.stop();
        System.out.println("Total prime no found " + count1 + " in " + stopwatch.elapsed(TimeUnit.MILLISECONDS) + " mili sec");

        stopwatch = Stopwatch.createStarted();
        long count2 = numbers.stream().filter(n -> isPrime(n)).count();
        System.out.println("Total prime no found " + count2 + " in " + stopwatch.elapsed(TimeUnit.MILLISECONDS) + " mili sec");
        stopwatch.stop();

The above program uses google Guava library to calculate time elapsed.

Upvotes: 2

Answers (2)

nosid

Reputation: 50044

Most likely, the problem is that during your test the JIT compiler (re-)compiles code. That means that your comparison isn't fair, because the later tests benefit from compilations caused by the earlier tests.

This is a frequently made mistake of micro benchmarks. There are a lot of articles explaining the problem, e.g. Robust Java benchmarking. If I add some code to warmup the JIT first, the results are expected. My main method looks as follows:

public static void main(String... args) {
    System.out.println("Warmup...");
    for (int i = 0; i < 5000; ++i) {
        run(Demo::testLoop, 5000);
        run(Demo::testStream, 5000);
        run(Demo::testParallel, 5000);
    }
    System.out.println("Benchmark...");
    int bound = 10000000;
    System.out.printf("Loop:     %s\n", run(Demo::testLoop, bound));
    System.out.printf("Stream:   %s\n", run(Demo::testStream, bound));
    System.out.printf("Parallel: %s\n", run(Demo::testParallel, bound));
}

And the output looks as follows:

Loop:     7.06s (664579)
Stream:   7.06s (664579)
Parallel: 3.84s (664579)

If you pass the option -XX:+PrintCompilation to the Java VM, you can see when and where the JIT kicks in, and that almost all compilations now happen during the warmup phase.

Note that parallel streams are not the best solution for this kind of parallelization, because the complexity of isPrime depends on the value. That means, the first half of the sequence requires significantly less work than the second half (and so on).

For reference, here are the remaining methods of my implementation:

public static boolean isPrime(int value) {
    for (int i = 2; i * i <= value; ++i)
        if (value % i == 0) return false;
    return true;
}

public static long testLoop(int bound) {
    long count = 0;
    for (int i = 2; i < bound; ++i)
        if (isPrime(i)) ++count;
    return count;
}

public static long testStream(int bound) {
    return IntStream.range(2, bound).filter(Demo::isPrime).count();
}

public static long testParallel(int bound) {
    return IntStream.range(2, bound).parallel().filter(Demo::isPrime).count();
}

public static String run(IntToLongFunction operation, int bound) {
    long start = System.currentTimeMillis();
    long count = operation.applyAsLong(bound);
    long millis = System.currentTimeMillis() - start;
    return String.format("%4.2fs (%d)", millis / 1000.0, count);
}

Upvotes: 3

edharned

Reputation: 1904

It is a well known fact that the F/J framework needs to warm up. The code is written in such a fashion that it only performs well when compiled. You also have to consider the time it takes to create threads. Having a warm up period in a production environment is subjective.

There is a lot of awful code in the framework to overcome this sluggish behavior when first starting.

Upvotes: 0

Java 8 Parallel processing and Lambda expression

Answers (2)

Related Questions