Optimization of Java Stream API functional interfaces for highly loaded system

Question

We have methods with Java Stream API that are invoked very frequently, e.g. 10'000 - 20'000 times per second (a data streaming system). Let's review the following simple test method (intentionally simplified and doesn't make real value):

public void test() {
        Stream.of(1, 2, 3, 4, 5)
                .map(i -> i * i)
                .filter(new SuperPredicate())
                .sorted(Comparator.comparing(i -> -i + 1,  Comparator.nullsFirst(Comparator.naturalOrder())))
                .forEach(System.out::println);
 }

class SuperPredicate implements Predicate {
    public SuperPredicate() {
        System.out.println("SuperPredicate constructor");
    }
    @Override
    public boolean test(Integer i) {
        return i % 3 != 0;
    }
}

On each invocation of test method, new instances of functional interfaces will be created (in our example, SuperPredicate and Comparator.nullsFirst()). So for frequent method invocations, thousands of excess objects will be created. I understand that creation of an object takes few nanoseconds in Java, but still, if we are talking about high load, it might also increase load of GC, and, as a result, influence performance.

As I see, we could move creation of such functional interfaces into private static final variables inside the same class, as they are stateless, it slightly decreases load on the system. It's kind of micro-optimization. Do we need to do this? Does Java compiler / JIT compiler somehow optimize such cases? Or maybe the compiler has some options / optimization flags to improve such cases?

Holger · Accepted Answer

You can only store objects in static final fields for reuse, when they don’t depend on variables of the surrounding context, not to speak of potentially changing state.

In that case, there is no reason to create a class like SuperPredicate at all. You can simply use i -> i % 3 != 0 and get the behavior of remembering the first created instance for free. As explained in Does a lambda expression create an object on the heap every time it's executed?, in the reference implementation, the instances created for non-capturing lambda expressions will be remembered and reused.

There is no need for a new comparator either. Letting potential overflows aside, using the function i -> -i + 1 does just reverse the order due to the negation whereas +1 has no effect on the order. Since the result of the expression -i + 1 can never be null, there is no need for Comparator.nullsFirst(Comparator.naturalOrder()). So you can replace the entire comparator with Comparator.reverseOrder(), to the same result but not bearing any object instantiation, as reverseOrder() will return a shared singleton.

As explained in What is the equivalent lambda expression for System.out::println, the method reference System.out::println is capturing the current value of System.out. So the reference implementation does not reuse the instance that is referencing a PrintStream instance. If we change it to i -> System.out.println(i), it will be a non-capturing lambda expression which will re-read System.out on each function evaluation.

So when we use

Stream.of(1, 2, 3, 4, 5)
    .map(i -> i * i)
    .filter(i -> i % 3 != 0)
    .sorted(Comparator.reverseOrder())
    .forEach(i -> System.out.println(i));

instead of your example code, we get the same result, but save four object instantiations, for the predicate, the consumer, the nullsFirst(…) comparator and the comparing(…) comparator.

To estimate the impact of this saving, Stream.of(…) is a varargs method, so a temporary array will be created for the arguments, then, it will return an object representing the stream pipeline. Each intermediate operation creates another temporary object representing the changed state of the stream pipeline. Internally, a Spliterator implementation instance will be used. This make a total of six temporary objects, just for describing the operation.

When the terminal operation starts, a new object representing the operation will be created. Each intermediate operation will be represented by a Consumer implementation having a reference to the next consumer, so the composed consumer can be passed to the Spliterator’s forEachRemaining method. Since sorted is a stateful operation, it will store all elements into an intermediate ArrayList (which makes two objects) first, to sort it before passing them to the next consumer.

This makes a total of twelve objects, as the fixed overhead of the stream pipeline. The operation System.out.println(i) will convert each Integer object to a String object, which consists of two objects, as each String object is a wrapper around an array object. This gives ten additional objects for this specific example, but more important, two objects per element, so using the same stream pipeline for a larger dataset will increase the number of objects created during the operation.

I think, the actual number of temporary objects created before and behind the scenes, renders the saving of four objects irrelevant. If allocation and garbage collection performance ever becomes relevant for your operation, you usually have to focus on the per element costs, rather than the fixed costs of the stream pipeline.

Optimization of Java Stream API functional interfaces for highly loaded system

Answers (1)

Related Questions