Reputation: 38910
I am trying to understand java-8 streams in detail.
From oracle documentation page on streams:
Streams differ from collections in several ways:
No storage. A stream is not a data structure that stores elements; instead, it conveys elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations.
Stream operations and pipelines
Stream operations are divided into intermediate and terminal operations, and are combined to form stream pipelines.
A stream pipeline consists of a source (such as a Collection, an array, a generator function, or an I/O channel); followed by zero or more intermediate operations such as Stream.filter or Stream.map; and a terminal operation such as Stream.forEach or Stream.reduce.
Intermediate operations return a new stream
Apart from documentation, I have gone through related SE question:
How does streams in Java affect memory consumption?
Everywhere it was quoted that additional memory was not consumed due to pipe lining of stream operations. original stream will be passed through a pipeline.
One working example from Benjamin blog:
List<String> myList =
Arrays.asList("a1", "a2", "b1", "c2", "c1");
myList
.stream()
.filter(s -> s.startsWith("c"))
.map(String::toUpperCase)
.sorted()
.forEach(System.out::println);
But when intermediate operations like filter, map and sorted
returns new stream, how come it does not increase memory consumption? Am I missing something here?
Upvotes: 3
Views: 1446
Reputation: 770
Try to read here http://www.oracle.com/technetwork/articles/java/ma14-java-se-8-streams-2177646.html and/or here http://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/, the concepts you're trying to figure out is explained fairly well I think.
Basically what you can see is that for most intermediate operations, they do not happen all at once for each operation. 1 element at a time they are processed through all of the intermediate operations and either discarded or put into a collection/added to a sum/printed etc. depending on the terminal operation. If it is a collect type terminal operation there will of course be some memory overhead when making this new collection, but individually in the stream nothing is saved. This is also why you cannot iterate over a stream twice (partly).
There are however some operations, such as stream.sorted(func) that may need some state during processing.
Upvotes: 2
Reputation: 726519
I think you interpreted "no storage" section of the documentation too literally, as "no memory increase." This interpretation is incorrect: "no storage" means "no storage for stream elements". Stream object itself represents a fixed overhead, in the same way as an empty collection has some overhead, so the size of the stream itself does not count.
But when intermediate operations like filter, map and sorted returns new stream, how come it does not increase memory consumption?
It does. However, the increase in size is fixed, i.e. an O(1) increase. This is in contrast with collections, where the increase for making a copy of a collection of n
elements is O(n).
Upvotes: 7