Reputation: 1
public List<XYZ> getFilteredList(List<XYZ> l1) {
return l1
.stream()
.filter(distinctByKey(XYZ::getName))
.filter(distinctByKey(XYZ::getPrice))
.collect(Collectors.toList());
}
private static <T> Predicate<T> distinctByKey(Function<? super T, Object>
keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}
Can anyone please help me,
What is the meaning of this line ------->
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
Why is the lambda result compared to null?
Upvotes: 0
Views: 2243
Reputation: 109547
Here there is indeed something to comprehend.
Stream.filter
is passed a Predicate
to test whether elements of the stream should be processed or not.distinctByKey
. So there is one single predicate instance.t -> ...
uses a local variable seen
. This object will survice the function calls end, and will be available during the life of the lambda. Technically this is done by having a local variable seen
that gets lost after the function ends, and a second variable seen
in the predicate instance - with the same Map object.seen
of the predite can be tested for every stream element.The intention to return true if the keys still was not stored in seen
.
l1.parallelStream()
.put
would have worked, but theoretically replace the old Boolean.TRUE with a new Boolean.TRUE. So putIfAbsent
indeed is better, doing less.ConcurrentHashSet
and a Collections.synchonizedSet
wrapper class would revert internally to a ConcurrentHashMap
I think.
Still better would be to use ConcurrentHashMap.newKeySet()
as in the prior answer.So the approach has some quirks but is fine.
Upvotes: 0
Reputation: 26
Apart from the higher comment, I have found more clear approach:
private <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = new HashSet<>();
return t -> seen.add(keyExtractor.apply(t)); // returns true when item was added, and false if not
}
Use new HashSet<>()
if you don't plan to use it in parallelStreams, ConcurrentHashSet is overkill. In case of parallel processing, sure better use ConcurrentHashSet, like
private <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
Set<Object> seen = ConcurrentHashMap.newKeySet();
return t -> seen.add(keyExtractor.apply(t));
}
Upvotes: 0
Reputation: 40034
Your question revolves about the following:
return t -> seen.putIfAbsent(keyExtractor.apply(t),
Boolean.TRUE) == null;
closure
via seen
even though the map itself is now out of scope.keyExtractor
will retrieve the key (either name or price) in your example via the setters provided as method references (e.g. XYZ::getName
)putIfAbsent
tries to add the boolean value true
to the map for the supplied key (in this case, the name
and price
from the keyExtractor
). If the key was already present, it returns that value which would be true
. Since true
is not equal to null
, false
is returned and the filter doesn't pass the value. If the value was not there, null is returned. Since null == null
is true, true
will be returned and the value passed thru the filter (i.e. it is thusfar distinct).Here is an example of how this would work. This uses a simple record and only applying a filter on name.
record XYZ(String getName, String other){
@Override
public String toString() {
return String.format("[name=%s, other=%s]", getName, other);
}
}
public static void main(String[] args) {
List<XYZ> l1 = List.of(
new XYZ("A","B"),
new XYZ("B","B"),
new XYZ("B","C"),
new XYZ("B","D"),
new XYZ("C","B"));
Object ob =
l1.stream().filter(distinctByKey(XYZ::getName))
.collect(Collectors.toList());
System.out.println(ob);
}
prints
[[name=A, other=B], [name=B, other=B], [name=C, other=B]]
Notice that only the first Name of B
was allowed thru the filter, the others were blocked.
private static <T> Predicate<T>
distinctByKey(Function<? super T, Object> keyExtractor) {
Map<Object, Boolean> seen = new ConcurrentHashMap<>();
return t -> seen.putIfAbsent(keyExtractor.apply(t),
Boolean.TRUE) == null;
}
Upvotes: 1