Maurice
Maurice

Reputation: 7381

what is the difference between a stateful and a stateless lambda expression?

According to the OCP book one must avoid stateful operations otherwise known as stateful lambda expression. The definition provided in the book is 'a stateful lambda expression is one whose result depends on any state that might change during the execution of a pipeline.'

They provide an example where a parallel stream is used to add a fixed collection of numbers to a synchronized ArrayList using the .map() function.

The order in the arraylist is completely random and this should make one see that a stateful lambda expression produces unpredictable results in runtime. That's why its strongly recommended to avoid stateful operations when using parallel streams so as to remove any potential data side effects.

They don't show a stateless lambda expression that provides a solution to the same problem (adding numbers to a synchronized arraylist) and I still don't get what the problem is with using a map function to populate an empty synchronized arraylist with data... What is exactly the state that might change during the execution of a pipeline? Are they referring to the Arraylist itself? Like when another thread decides to add other data to the ArrayList when the parallelstream is still in the process adding the numbers and thus altering the eventual result?

Maybe someone can provide me with a better example that shows what a stateful lambda expression is and why it should be avoided. That would be very much appreciated.

Thank you

Upvotes: 9

Views: 11311

Answers (5)

Kusum
Kusum

Reputation: 271

A stateful lambda expression is one whose result depends on any state that might change during the execution of a stream pipeline.

Let's understand this with an example here:

    List<Integer> list = Arrays.asList(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15);
    List<Integer> result = new ArrayList<Integer>();

    list.parallelStream().map(s -> {
            synchronized (result) {
              if (result.size() < 10) {
                result.add(s);
              }
            }
            return s;
        }).forEach( e -> {});
     System.out.println(result);  

When you run this code 5 times, the output would/could be different all the time. Reason behind is here processing of Lambda expression inside map updates result array. Since here the result array depend on the size of that array for a particular sub stream, which would change every time this parallel stream would be called.

For better understanding of parallel stream: Parallel computing involves dividing a problem into subproblems, solving those problems simultaneously (in parallel, with each subproblem running in a separate thread), and then combining the results of the solutions to the subproblems. When a stream executes in parallel, the Java runtime partitions the streams into multiple substreams. Aggregate operations iterate over and process these substreams in parallel and then combine the results.

Hope this helps!!!

Upvotes: 1

A stateful lambda expression is one whose result depends on any state that might change during the execution of a pipeline. On the other hand, a stateless lambda expression is one whose result does not depend on any state that might change during the execution of a pipeline.

Source: OCP: Oracle Certified Professional Java SE 8 Programmer II Study Guide: Exam 1Z0-809by Jeanne Boyarsky,‎ Scott Selikoff

    List < Integer > data = Collections.synchronizedList(new ArrayList < > ());

            Arrays.asList(1, 2, 3, 4, 5, 6, 7).parallelStream()


                   .map(i -> {
                    data.add(i);
                    return i;
                }) // AVOID STATEFUL LAMBDA EXPRESSIONS!
                .forEachOrdered(i -> System.out.print(i+" "));


            System.out.println();
            for (int e: data) {
                System.out.print(e + " ");

Possible Output:

1 2 3 4 5 6 7 
1 7 5 2 3 4 6 

It strongly recommended that you avoid stateful operations when using parallel streams, so as to remove any potential data side effects. In fact, they should generally be avoided in serial streams wherever possible, since they prevent your streams from taking advantage of parallelization.

Upvotes: 3

jspek
jspek

Reputation: 446

Here is an example where a stateful operation returns a different result each time:

public static void main(String[] args) {

Set<Integer> seen = new HashSet<>();

IntStream stream = IntStream.of(1, 2, 3, 1, 2, 3);

// Stateful lambda expression
IntUnaryOperator mapUniqueLambda = (int i) -> {
    if (!seen.contains(i)) {
        seen.add(i);
        return i;
    }
    else {
        return 0;
    }
};

int sum = stream.parallel().map(mapUniqueLambda).peek(i ->   System.out.println("Stream member: " + i)).sum();

System.out.println("Sum: " + sum);
}

In my case when I ran the code I got the following output:

Stream member: 1
Stream member: 0
Stream member: 2
Stream member: 3
Stream member: 1
Stream member: 2
Sum: 9

Why did I get 9 as the sum if I'm inserting into a hashset?
The answer: Different threads took different parts of the IntStream For example values 1 & 2 managed to end up on different threads.

Upvotes: 3

Eugene
Eugene

Reputation: 120858

The first problem is this:

 List<Integer> list = new ArrayList<>();

    List<Integer> result = Stream.of(1, 2, 3, 4, 5, 6)
            .parallel()
            .map(x -> {
                list.add(x);
                return x;
            })
            .collect(Collectors.toList());

System.out.println(list);

You have no idea what the result will be here, since you are adding elements to a non-thread-safe collection ArrayList.

But even if you do:

  List<Integer> list = Collections.synchronizedList(new ArrayList<>());

And perform the same operation the list has no predictable order. Multiple Threads add to this synchronized collection. By adding the synchronized collection you guarantee that all elements are added (as opposed to the plain ArrayList), but in which order they will be present in unknown.

Notice that list has no order guarantees what-so-ever, this is called processing order. While result is guaranteed to be: [1, 2, 3, 4, 5, 6] for this particular example.

Depending on the problem, you usually can get rid of the stateful operations; for your example returning the synchronized List would be:

 Stream.of(1, 2, 3, 4, 5, 6)
            .filter(x -> x > 2) // for example a filter is present
            .collect(Collectors.collectingAndThen(Collectors.toList(), 
                          Collections::synchronizedList));

Upvotes: 5

Jeremy Grand
Jeremy Grand

Reputation: 2370

To try to give an example, let's consider the following Consumer (note : the usefulness of such a function is not of the matter here) :

public static class StatefulConsumer implements IntConsumer {

    private static final Integer ARBITRARY_THRESHOLD = 10;
    private boolean flag = false;
    private final List<Integer> list = new ArrayList<>();

    @Override
    public void accept(int value) {
        if(flag){   // exit condition
            return; 
        }
        if(value >= ARBITRARY_THRESHOLD){
            flag = true;
        }
        list.add(value); 
    }

}

It's a consumer that will add items to a List (let's not consider how to get back the list nor the thread safety) and has a flag (to represent the statefulness).

The logic behind this would be that once the threshold has been reached, the consumer should stop adding items.

What your book was trying to say was that because there is no guaranteed order in which the function will have to consume the elements of the Stream, the output is non-deterministic.

Thus, they advise you to only use stateless functions, meaning they will always produce the same result with the same input.

Upvotes: 3

Related Questions