SarveshKaushal
SarveshKaushal

Reputation: 81

Why is multi-threaded with CompletableFuture slow as compared to single threaded code?

I am trying to improve the performance of current code in my project which is running in a single thread. The code is doing something like this: 1. Get first list of 10000000 objects. 2. Get second list of 10000000 objects. 3. Merge these two (after some changes) into a 3rd list.

   Instant s = Instant.now();
    List<Integer> l1 = getFirstList();
    List<Integer> l2 = getSecondList();
    List<Integer> l3 = new ArrayList<>();
    l3.addAll(l1);
    l3.addAll(l2);
    Instant e = Instant.now();
    System.out.println("Execution time: " + Duration.between(s, e).toMillis());

Here are the sample methods to get and combine lists

    private static List<Integer> getFirstList() {
    System.out.println("First list is being created by: "+ Thread.currentThread().getName());
    List<Integer> l = new ArrayList<>();
    for (int i = 0; i < 10000000; i++) {
        l.add(i);
    }
    return l;
}

private static List<Integer> getSecondList() {

    System.out.println("Second list is being created by: "+ Thread.currentThread().getName());
    List<Integer> l = new ArrayList<>();
    for (int i = 10000000; i < 20000000; i++) {
        l.add(i);
    }
    return l;
}
private static List<Integer> combine(List<Integer> l1, List<Integer> l2) {

    System.out.println("Third list is being created by: "+ Thread.currentThread().getName());
   ArrayList<Integer> l3 = new ArrayList<>();
   l3.addAll(l1);
   l3.addAll(l2);
    return l3;
}

I am trying to re-write above code as follows:

    ExecutorService executor = Executors.newFixedThreadPool(10);
    Instant start = Instant.now();
    CompletableFuture<List<Integer>> cf1 = CompletableFuture.supplyAsync(() -> getFirstList(), executor);
    CompletableFuture<List<Integer>> cf2 = CompletableFuture.supplyAsync(() -> getSecondList(), executor);

    CompletableFuture<Void> cf3 = cf1.thenAcceptBothAsync(cf2, (l1, l2) -> combine(l1, l2), executor);
    try {
        cf3.get();
    } catch (InterruptedException e) {
        e.printStackTrace();
    } catch (ExecutionException e) {
        e.printStackTrace();
    }
    Instant end = Instant.now();
    System.out.println("Execution time: " + Duration.between(start, end).toMillis());

    executor.shutdown();

Single threaded code is executing in 4-5 seconds while multi-threaded code is taking 6+ seconds to execute. Am I doing something wrong?

Upvotes: 2

Views: 2157

Answers (2)

Holger
Holger

Reputation: 298153

You are executing these method the first time, so they start in interpreted mode. To accelerate their first execution, the optimizer has to replace them while they’re running (called on-stack-replacement), which does not always give the same performance as when re-entering an optimized result. Doing this concurrently seems to be even worse, at least for Java 8, as I got entirely different results for Java 11.

So the first step would be inserting an explicit invocation, e.g. getFirstList(); getSecondList();, to see how it would perform when not being invoked for the first time.

Another aspect is the garbage collection. Some JVMs start with a small initial heap and will perform a full GC each time the heap is expanded, which has an impact on all threads.

So the second step would be starting with -Xms1G (or even better, -Xms2G), to start with a reasonable heap size for the amount of objects you’re gonna create.

But note that the 3rd step of adding the intermediate result lists to the final result list (which happens sequential in either case) has a significant impact on the performance.

So the 3rd step would be replacing the constructing of the final list with l3 = new ArrayList<>(l1.size() + l2.size()) for both variants, to ensure that the list has the appropriate initial capacity.

The combination of these steps resulted in less than a second for the sequential execution and less than half a second for the multi-threaded execution under Java 8.

For Java 11, which had a far better starting point, needing about one second only out-of-the-box, these improvements gave less dramatic speedup. It also seems that it has a much higher memory consumption for this code.

Upvotes: 2

Alexei Kaigorodov
Alexei Kaigorodov

Reputation: 13525

in the single threaded variant, l3.addAll(l1); l3.addAll(l2); takes elements of l1 and l2 from the processor cache (they were put there while executing getFirstList and getSecondList).

In parallel variant, method combine() runs on a different processor core with an empty cache and gets all elements from the main memory, which is much slower.

Upvotes: 1

Related Questions