Tim B
Tim B

Reputation: 41188

Using java streams to put the last encountered value into a map

I have some code as follows:

Map<RiskFactor, RiskFactorChannelData> updateMap =
    updates.stream().filter(this::updatedValueIsNotNull). // Remove null updated values
        collect(Collectors.toMap(
            u -> u.getUpdatedValue().getKey(), // then merge into a map of key->value.
            Update::getUpdatedValue,
            (a, b) -> b)); // If two values have the same key then take the second value

Specifically I want to take the values from the list and put them into the map. That all works perfectly. My concern though is with ordering.

For example if the list has:

a1, b1, a2

How do I ensure that the final map contains:

a->a2
b->b1

Instead of

a->a1
b->b1

The incoming list is ordered, stream().filter() should have maintained the order but I can't see anything in the documentation of Collectors.toMap about ordering of the inputs.

Is this safe in the general case or have I just been lucky on my test cases so far? Am I going to be JVM dependent and at risk of this changing in the future?

This is very simple to guarantee if I just write a for loop but the "fuzzyness" of potential stream behavior is making me concerned.

I'm not planning to use parallel for this, I'm purely seeking to understand the behavior in the case of a sequential non-parallel stream that reaches to toMap.

Upvotes: 8

Views: 866

Answers (2)

Holger
Holger

Reputation: 298193

The term “most recent value” is a bit misleading. Since you want the last value according to encounter order, the answer is that toMap will respect the encounter order.

Its documentation refers to Map.merge to explain the semantics of the merge function, but unfortunately, that documentation is a bit thin too. It doesn’t mention the fact that this function is invoked with (oldValue,newValue) explicitly; it can only be deduced from the code example.

toMap’s documentation further states:

The returned Collector is not concurrent. For parallel stream pipelines, the combiner function operates by merging the keys from one map into another, which can be an expensive operation. If it is not required that results are merged into the Map in encounter order, using toConcurrentMap(Function, Function, BinaryOperator, Supplier) may offer better parallel performance.

So it explicitly directs to a different collector, if encounter order is not required. Generally, all builtin collectors provided by Collectors are only unordered, if explicitly stated, which is only the case for the “…Concurrent…” collectors and the toSet() collector.

Upvotes: 5

freedev
freedev

Reputation: 30087

It is safe, Collection.stream() creates a sequential stream.

I suggest to take a look at Collectors.toMap in case of collisions it takes care to choose the correct value. In your case you should use the more recent.

The part you're interested in is (a, b) -> b where you arbitrarily choose the second element, there you should choose the more recent.

I think your problems came from the fact that are not sure about the processing order, in case you want continue to use streams (instead of a for loop) you could enforce this state adding .sequential() after .stream().

Another way, I would prefer, is add a timestamp to the RiskFactorChannelData, and use even a parallel stream.

Upvotes: 2

Related Questions