Reputation: 7367
I'm reading Java 8 book by Richard Warburton and came up with this:
Some operation are more expensive on ordered stream. This problem can be solved by eliminating ordering. To do so, call the stream's
unordered
method. [...]
I was quite perplexed by that. Suppose we have Stream<Integer> stream = Arrays.asList(1, 2, 3, 4).stream();
Since List<Integer>
defines an encounter order for the stream (some) operation might be performed inefficiently. Why is that?
How does it affect the processing and what makes it slow? In order to make things faster, in that case, should we be calling this as
Stream<Integer> stream = Arrays.asList(1, 2, 3, 4).stream().unordered();
? Sounds strange, to say the least...
Upvotes: 5
Views: 1291
Reputation: 36107
This is explained in detail in the documentation: https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
Ordering
Streams may or may not have a defined encounter order. Whether or not a stream has an encounter order depends on the source and the intermediate operations. Certain stream sources (such as List or arrays) are intrinsically ordered, whereas others (such as HashSet) are not. Some intermediate operations, such as sorted(), may impose an encounter order on an otherwise unordered stream, and others may render an ordered stream unordered, such as BaseStream.unordered(). Further, some terminal operations may ignore encounter order, such as forEach().If a stream is ordered, most operations are constrained to operate on the elements in their encounter order; if the source of a stream is a List containing [1, 2, 3], then the result of executing map(x -> x*2) must be [2, 4, 6]. However, if the source has no defined encounter order, then any permutation of the values [2, 4, 6] would be a valid result. For sequential streams, the presence or absence of an encounter order does not affect performance, only determinism. If a stream is ordered, repeated execution of identical stream pipelines on an identical source will produce an identical result; if it is not ordered, repeated execution might produce different results.
For parallel streams, relaxing the ordering constraint can sometimes enable more efficient execution. Certain aggregate operations, such as filtering duplicates (distinct()) or grouped reductions (Collectors.groupingBy()) can be implemented more efficiently if ordering of elements is not relevant. Similarly, operations that are intrinsically tied to encounter order, such as limit(), may require buffering to ensure proper ordering, undermining the benefit of parallelism. In cases where the stream has an encounter order, but the user does not particularly care about that encounter order, explicitly de-ordering the stream with unordered() may improve parallel performance for some stateful or terminal operations. However, most stream pipelines, such as the "sum of weight of blocks" example above, still parallelize efficiently even under ordering constraints.
Upvotes: 10