Reputation: 1357
I've come across a rule in Sonar which says:
A key difference with other intermediate Stream operations is that the Stream implementation is free to skip calls to
peek()
for optimization purpose. This can lead topeek()
being unexpectedly called only for some or none of the elements in the Stream.
Also, it's mentioned in the Javadoc which says:
This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline
In which case can java.util.Stream.peek()
be skipped? Is it related to debugging?
Upvotes: 31
Views: 5117
Reputation: 28988
peek()
is an intermediate operation, and it expects a consumer which perform an action (side-effect) on elements of the stream.
In case when a stream pipe-line doesn't contain intermediate operations which can change the number of elements in the stream, like takeWhile
, filter
, limit
, etc., and ends with terminal operation count()
and when the stream-source allows evaluating the number of elements in it, then count()
simply interrogates the source and returns the result. All intermediate operations get optimized away.
Note: this optimization of count()
operation, which exists since Java 9 (see the API Note), is not directly related to peek()
, it would affect every intermediate operation which doesn't change the number of elements in the stream (for now these are map()
, sorted()
, peek()
).
peek()
has a very special niche among other intermediate operations.
By its nature, peek()
differs from other intermediate operations like map()
as well as from the terminal operations that cause side-effects (like peek()
does), performing a final action for each element that reaches them, which are forEach()
and forEachOrdered()
.
The key point is that peek()
doesn't contribute to the result of stream execution. It never affects the result produced by the terminal operation, whether it's a value or a final action.
In other words, if we throw away peek()
from the pipeline, it would not affect the terminal operation.
Documentation of the method peek()
as well the Stream API documentation warns its action could be elided, and you shouldn't rely on it.
A quote from the documentation of peek()
:
In cases where the stream implementation is able to optimize away the production of some or all the elements (such as with short-circuiting operations like findFirst, or in the example described in count()), the action will not be invoked for those elements.
A quote from the API documentation, paragraph Side-effects:
The eliding of side-effects may also be surprising. With the exception of terminal operations
forEach
andforEachOrdered
, side-effects of behavioral parameters may not always be executed when the stream implementation can optimize away the execution of behavioral parameters without affecting the result of the computation.
Here's an example of the stream (link to the source) where none of the intermediate operations gets elided apart from peek()
:
Stream.of(1, 2, 3)
.parallel()
.peek(System.out::println)
.skip(1)
.map(n -> n * 10)
.forEach(System.out::println);
In this pipe-line peek()
presides skip()
therefor you might expect it to display every element from the source on the console. However, it doesn't happen (element 1
will not be printed). Due to the nature of peek()
it might be optimized away without breaking the code, i.e. without affecting the terminal operation.
That's why documentation explicitly states that this operation is provided exclusively for debugging purposes, and it should not be assigned with an action which needs to be executed at any circumstances.
Upvotes: 13
Reputation: 44398
Not only peek
but also map
can be skipped. It is for sake of optimization.
For example, when the terminal operation count()
is called, it makes no sense to peek
or map
the individual items as such operations do not change the number/count of the present items.
Here are two examples:
1. Map and peek are not skipped because the filter can change the number of items beforehand.
long count = Stream.of("a", "aa")
.peek(s -> System.out.println("#1"))
.filter(s -> s.length() < 2)
.peek(s -> System.out.println("#2"))
.map(s -> {
System.out.println("#3");
return s.length();
})
.count();
#1 #2 #3 #1 1
2. Map and peek are skipped because the number of items is unchanged.
long count = Stream.of("a", "aa")
.peek(s -> System.out.println("#1"))
//.filter(s -> s.length() < 2)
.peek(s -> System.out.println("#2"))
.map(s -> {
System.out.println("#3");
return s.length();
})
.count();
2
Important: The methods should have no side-effects (they do above, but only for the sake of example).
Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.
The following implementation is dangerous. Assuming callRestApi
method performs a REST call, it won't be performed as the Stream violates the side-effect.
long count = Stream.of("url1", "url2")
.map(string -> callRestApi(HttpMethod.POST, string))
.count();
/**
* Performs a REST call
*/
public String callRestApi(HttpMethod httpMethod, String url);
Upvotes: 29
Reputation: 18959
The referenced optimization at this thread is the known architecture of java streams which is based on lazy computation.
Streams are lazy; computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed. (java doc)
Also
Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed. (java doc)
This lazy computation affects several other operators not just .peek
. In the same way that peek (which is an intermediate operation) is affected by this lazy computation are also all other intermediate operations affected (filter
, map
, mapToInt
, mapToDouble
, mapToLong
, flatMap
, flatMapToInt
, flatMapToDouble
, flatMapToLong
). But probably someone not understanding the concept of lazy computation can be caught in the trap with .peek
that sonar reports here.
So the example that the Sonar correctly reports
Stream.of("one", "two", "three", "four")
.filter(e -> e.length() > 3)
.peek(e -> System.out.println("Filtered value: " + e));
should not be used as is, because no terminal operation in the above example exists. So Streams will not invoke at all the intermidiate .peek
operator, even though 2 elements ( "three"
, "four"
) are eligible to pass through the stream pipeline.
Example 1. Add a terminal operator like the following:
Stream.of("one", "two", "three", "four")
.filter(e -> e.length() > 3)
.peek(e -> System.out.println("Filtered value: " + e))
.collect(Collectors.toList()); // <----
and the elements passed through would be also passed through .peek
intermediate operator. Never an element would be skipped on this example.
Example 2. Now here is the interesting part, if you use some other terminal operator for example the .findFirst
because the Stream Api is based on lazy computation
Stream.of("one", "two", "three", "four")
.filter(e -> e.length() > 3)
.peek(e -> System.out.println("Filtered value: " + e))
.findFirst(); // <----
Only 1 element will pass through the operator .peek
and not 2.
But as long as you know what you are doing (example 1) and you have understood lazy computation, you can expect that in certain cases .peek
will be invoked for every element passing down the stream channel and no element would be skipped, and in other cases you would know which elements are to be skipped from .peek
.
But extremely caution if you use .peek
with parallel streams since there exists another set of traps which can arise. As the java API for .peek
mentions:
For parallel stream pipelines, the action may be called at * whatever time and in whatever thread the element is made available by the * upstream operation. If the action modifies shared state, * it is responsible for providing the required synchronization.
Upvotes: 4