Vincent Tjeng
Vincent Tjeng

Reputation: 743

When a Collection is converted to a Stream, does the resulting Collection maintain any links to the original?

When working with a Collection in Java, I regularly convert it to a Stream to begin with, process and collect it, and then return the resulting Collection. For example:

 static Set<String> getTopUsers(Set<String> users){
      Set<String> topUsers = users.stream().filter((String s) -> isTop(s)).collect(Collectors.toSet());
      return topUsers;
 }

 static boolean isTop(String user){
      // some logic
 }

Does the topUsers return value have any link to the original? For instance, could adding and removing elements from users result in any changes in topUsers, and vice-versa? I'm asking because I haven't been copying my parameters (e.g. users in this case) as I pass them in, and I'm wondering whether I should.

(I've looked at the documentation for Stream, and it mentions that "an operation on a stream produces a result, but does not modify its source" - but I just wanted to be sure that there's nothing that I'm not missing out on)

Upvotes: 1

Views: 102

Answers (2)

GPI
GPI

Reputation: 9328

From the documentation you quoted, a few nuggets (emphasis mine) :

Stream operations are divided into intermediate and terminal operations, and are combined to form stream pipelines. A stream pipeline consists of a source (such as a Collection, an array, a generator function, or an I/O channel); followed by zero or more intermediate operations such as Stream.filter or Stream.map; and a terminal operation such as Stream.forEach or Stream.reduce.

Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.

For that to be true, we have to assume that as long as you have not called a terminal operation your original stream is not processed yet (or at least, totally processed).

Terminal operations, such as Stream.forEach or IntStream.sum, may traverse the stream to produce a result or a side-effect. After the terminal operation is performed, the stream pipeline is considered consumed, and can no longer be used; if you need to traverse the same data source again, you must return to the data source to get a new stream. In almost all cases, terminal operations are eager, completing their traversal of the data source and processing of the pipeline before returning. Only the terminal operations iterator() and spliterator() are not; these are provided as an "escape hatch" to enable arbitrary client-controlled pipeline traversals in the event that the existing operations are not sufficient to the task.

Any form of reduction is a terminal operation.

In your case, the users Set is then traversed and consumed as soon as (but no sooner than) the runtime hits the collect method. At which point, the datas coming out of your users set will be read and processed, and further updates to the original Set ignored. topUsers and users are "disconnected" at this point.

If you wanted your method to return "kind of" a live, filtred view of your original set using the Stream API, you could consider returning it as a Stream insted of a Set, and only expressing Intermediat Operations, but leavning the actual collection up to the caller.

Also note that a Collections tool such as Google Guava allows you to build a live updating Collection from anoother collection, using a predicate function, in pretty much the same way you're doing there. It might be more appropriate to what your are seeking to achieve here (be warned of concurrency effects though!).

Upvotes: 1

Sotirios Delimanolis
Sotirios Delimanolis

Reputation: 280054

No, topUsers is a completely new Set with no relation to users. The Stream operations are executed, applying some transformation on the values and collecting the results in a new Set.

The values in the two sets might be the same (ex. your filter might not cause the removal of any values), but the sets themselves are independent.

Upvotes: 3

Related Questions