The Scientific Method
The Scientific Method

Reputation: 2436

What does it mean intermediate operations are lazily executed whereas terminal operations are eagerly executed in java Stream API?

list.stream().filter( a-> a < 20 && a > 7).forEach(a -> System.out.println(a));

fiter is lazily executed.

forEach is eagerly executed.

What does that mean?

Upvotes: 4

Views: 3560

Answers (5)

vishal_ratna
vishal_ratna

Reputation: 116

Ok, so this is how the whole stream chain looks like as pointed out by others.

Stream<Integer> s = Stream.of(1, 2, 3).map(i -> {
            System.out.println(i);
            return i;
        });

You can pass this stream to any method on a different thread and call any of the terminal operations, then this map will get executed.

Collection -> Stream -> (map) -> (filter) -> (map) -> collect(terminal)

When I were a newbie, it was extremely difficult to understand how it will be executed later on when we have already called a method. Under the hood, when you call map, stream APIs create a delegate which will be called at a later point in time. As you keep calling the operations one after another, it internally keeps creating a chain of delegates. The chain is basically a doubly linked list. Now, when you call any of the terminal operations, with the help of the previous pointer in DLL that was created, it traverses to previous nodes until it encounters a null(First operation that was called). It is precisely at that moment when it starts calling each of the delegate function in sequential order. Internally each operation is represented as a StateLessOP or StatefulOP. what happens is something like this(though I have simplified it),

node.operation.execute() -> node = node.next -> ..
node.operation.execute() ..... ...

Here, the operation is the delegate that was originally created.

I will now create something like an eager implementation of streams.

public interface IChain<Type> {

    <OutType> IChain<OutType> map(ActionFunction<Type,OutType> f);

}

public class Chain<T> implements IChain<T> {

    private final T source;
    private int depth;
    private Chain prev;
    private Chain next;

    public Chain(T object)
    {
        this.source = object;
        this.depth = 0;
        this.prev = this.next = null;
    }

    public Chain(T object, Chain<?> chain) {
        this.source = object;
        this.prev = chain;
        this.prev.next = this;
        this.depth = this.prev.depth + 1;
    }

    // It will result in eager execution of the propagation chain.
    @Override
    public <OutType> IChain<OutType> map(ActionFunction<T, OutType> f) {
        return new Chain<>(f.execute(source),this);
    }
}



public interface ActionFunction<IN, OUT> {

    OUT execute(IN in);
}

To use this,

Chain<?> c = (Chain<?>) new Chain<String>("Test String").map(s -> {
         ArrayList<String> list = new ArrayList<>();

         for(int i = 0; i<100 ; i++) {
             list.add(s);
         }
         return list;
     }).map(strings -> new StringBuilder(strings.get(0)));

Here, each map function will not wait for any terminal operation to happen. It will happen immediately. PS: Code does not make any sense. Just for explaining the concept.

Hope this helps.

Upvotes: 2

Leo Aso
Leo Aso

Reputation: 12493

Say you had the following operation.

list.stream()
    .map(a -> a * a)
    .filter(a -> a > 0 && a < 100)
    .map(a -> -a)
    .forEach(a -> System.out.println(a));

The intermediate operations are the maps and filters, the terminal operation is the forEach. If intermediate operations were eagerly executed, then .map(a -> a * a) would immediately map the whole stream and the result would be passed to .filter(a -> a > 0 && a < 10) which would immediately filter the result, which would then be passed to .map(a -> -a) which would map the filtered result and then pass it to forEach which would then immediately print each element from the stream.

However, intermediate operations are not eager, instead they are lazy. What this means is that the sequence

list.stream()
    .map(a -> a * a)
    .filter(a -> a > 0 && a < 100)
    .map(a -> -a)

does not actually do anything right away. It just creates a new stream that remembers the operations it is supposed to carry out, but does not actually carry them out until it is time to actually produce a result. It is not until forEach tries to read a value from the stream that it then goes to the original stream, takes a value, maps it using a -> a * a, filters it, and if it passes the filter, maps it using a -> -a and then passes that value to forEach.

It's like someone working in a restaurant that has been given the job of taking all the plates from the dirty pile, washing them, stacking them up and then giving them to the cook when he is ready to serve the food. If the person was eager, they would immediately take the whole pile of dirty plates, wash them all at once, and stack them up, then when the cook wants the plates, he hands them off one by one for serving.

However, a lazy employee would realize that the cook only needs one plate at a time, and only when the food is ready to serve. So when ever the cook needs a plate, the employee just takes one plate from the pile, washes it and hands it to the chef, going one by one until the plates are all washed and all the food is served.

So what's the advantage?

Well one major advantage is that the lazy approach considerably improves latency. As you are probably aware, a single thread of a program can only do one thing at a time. Extending the analogy a bit further, imagine there are about 800 plates, but the cook actually had to wait for the washer to finish washing the dishes and then hand one to him. If the eager washer insisted on washing all the plates first before handing any over, the cook would have to wait for all 800 plates to be washed, then serve 800 meals at once, by which point all the angry customers would have left.

However, with the lazy washer, for each meal the cook wants to serve, he only has to wait for one plate. So if washing a plate takes 10 seconds and serving is nearly instant, in scenario 1 all meals would be served at once but only after waiting for more than two hours. But in scenario 2, each meal is served about 10 seconds apart. So even though it takes the same amount of time to serve all meals, scenario 2 is certainly more desirable.

I've stretched the analogy a bit thin here, but hopefully this helps you understand it better.

Upvotes: 19

Ousmane D.
Ousmane D.

Reputation: 56453

Lazily executed means the operation will only be executed when necessary.

Eagerly executed means the operations will execute immediately.

So when are lazy intermediate operations executed you may ask?

When there’s a terminal operation (Eager operation) applied to the pipeline.

So how can we know if a operation is intermediate (lazy) or terminal (eager)?

When the operation returns a Stream<T> where T can be any type then it’s an intermediate operation (lazy); if the operation returns anything else i.e. void, int, boolean etc. then it’s terminal (eager) operation.

Upvotes: 3

LuCio
LuCio

Reputation: 5183

The JavaDoc of Stream says:

Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed.

JavaDoc about intermediate operations:

They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.

Since map is a lazy operation the follwoing code will print nothing:

Stream.of(1, 2, 3).map(i -> {
    System.out.println(i);
    return i;
});

This Stream is missing a terminal operation which would execute it, which would invoke the intermediate operations.

Similar list.stream().filter( a-> a > 20 && a < 7) will return a Stream but no element from the list has been filtered yet.

But even if a terminal operation is executed there is more about laziness:

Laziness also allows avoiding examining all the data when it is not necessary; for operations such as "find the first string longer than 1000 characters"

Lazy operations are executed if their execution is needed to determine the result of a Stream. And not all elements from a source have to be processed by a lazy operation.

JavaDoc on terminal operations:

In almost all cases, terminal operations are eager, completing their traversal of the data source and processing of the pipeline before returning.

Moreover only one terminal operation can be applied on a Stream.

After the terminal operation is performed, the stream pipeline is considered consumed, and can no longer be used;

Going on with the example:

long count = Stream.of(1, 2, 3).map(i -> {
    System.out.println(i);
    return i;
}).count();

To determine the count the mapping is irrelevant. Thus this code will still print nothing. But since count() is a terminal operation the stream is processed and count gets the value 3 assigned.

If we change the terminal operation to .min(Comparator.naturalOrder()); then all mappings are executed and we will see the printed integers.

Upvotes: 3

user10367961
user10367961

Reputation:

It means that list.stream().filter( a-> a > 20 && a < 7) won't start executing until a terminal operation (such as forEach(a -> System.out.println(a))) is applied on the stream.

This has important performance implications, since if there is no terminal operation applied on the stream, there will be no wasted resources on filtering it (or applying any non-terminal operations for that matter).

Upvotes: 1

Related Questions