Darkwriter
Darkwriter

Reputation: 1

JAVA 8 distinctByKey

public List<XYZ> getFilteredList(List<XYZ> l1) {
        return l1
                .stream()
                .filter(distinctByKey(XYZ::getName))
                .filter(distinctByKey(XYZ::getPrice))
                .collect(Collectors.toList());
    }

private static <T> Predicate<T> distinctByKey(Function<? super T, Object> 
 keyExtractor) {
        Map<Object,Boolean> seen = new ConcurrentHashMap<>();
        return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
    }

Can anyone please help me, What is the meaning of this line ------->
return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;

Why is the lambda result compared to null?

Upvotes: 0

Views: 2243

Answers (3)

Joop Eggen
Joop Eggen

Reputation: 109547

Here there is indeed something to comprehend.

  • Stream.filter is passed a Predicate to test whether elements of the stream should be processed or not.
  • This predicate is the result of one single function call to distinctByKey. So there is one single predicate instance.
  • The lambda predicate t -> ... uses a local variable seen. This object will survice the function calls end, and will be available during the life of the lambda. Technically this is done by having a local variable seen that gets lost after the function ends, and a second variable seen in the predicate instance - with the same Map object.
  • So the seen of the predite can be tested for every stream element.

The intention to return true if the keys still was not stored in seen.

  • The map is concurrent so you may call l1.parallelStream().
  • The test works by that a map put's will return any old value else null. So ideal to keep a set of unique keys and test some duplicity. put would have worked, but theoretically replace the old Boolean.TRUE with a new Boolean.TRUE. So putIfAbsent indeed is better, doing less.
  • A Set would be better than just putting that dummy value Boolean.TRUE. There is no ConcurrentHashSet and a Collections.synchonizedSet wrapper class would revert internally to a ConcurrentHashMap I think. Still better would be to use ConcurrentHashMap.newKeySet() as in the prior answer.

So the approach has some quirks but is fine.

Upvotes: 0

Oleksandr Bunin
Oleksandr Bunin

Reputation: 26

Apart from the higher comment, I have found more clear approach:

    private <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {

        Set<Object> seen = new HashSet<>();
        return t -> seen.add(keyExtractor.apply(t)); // returns true when item was added, and false if not
    }

Use new HashSet<>() if you don't plan to use it in parallelStreams, ConcurrentHashSet is overkill. In case of parallel processing, sure better use ConcurrentHashSet, like

    private <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {

        Set<Object> seen = ConcurrentHashMap.newKeySet();
        return t -> seen.add(keyExtractor.apply(t));
    }

Upvotes: 0

WJS
WJS

Reputation: 40034

Your question revolves about the following:

return t -> seen.putIfAbsent(keyExtractor.apply(t),
                Boolean.TRUE) == null;
  • first, the return returns the entire lambda (from t->.. onward). It still references the created Map as a closure via seen even though the map itself is now out of scope.
  • The keyExtractor will retrieve the key (either name or price) in your example via the setters provided as method references (e.g. XYZ::getName)
  • putIfAbsent tries to add the boolean value true to the map for the supplied key (in this case, the name and price from the keyExtractor). If the key was already present, it returns that value which would be true. Since true is not equal to null, false is returned and the filter doesn't pass the value. If the value was not there, null is returned. Since null == null is true, true will be returned and the value passed thru the filter (i.e. it is thusfar distinct).

Here is an example of how this would work. This uses a simple record and only applying a filter on name.

record XYZ(String getName, String other){
    @Override
    public String toString() {
        return String.format("[name=%s, other=%s]", getName, other);
    }
}
    
public static void main(String[] args) {
    List<XYZ> l1 = List.of(
            new XYZ("A","B"),
            new XYZ("B","B"),
            new XYZ("B","C"),
            new XYZ("B","D"),
            new XYZ("C","B"));

    
    Object ob =
            l1.stream().filter(distinctByKey(XYZ::getName))
                    .collect(Collectors.toList());
    System.out.println(ob);
}

prints

[[name=A, other=B], [name=B, other=B], [name=C, other=B]]

Notice that only the first Name of B was allowed thru the filter, the others were blocked.

private static <T> Predicate<T>
        distinctByKey(Function<? super T, Object> keyExtractor) {
    Map<Object, Boolean> seen = new ConcurrentHashMap<>();
    return t -> seen.putIfAbsent(keyExtractor.apply(t),
            Boolean.TRUE) == null;
}

Upvotes: 1

Related Questions