nimo23
nimo23

Reputation: 5698

simplify java stream to find duplicate properties

I have a users list and I want to find all users having duplicate names:

var allNames = users
              .stream()
              .map(u -> u.getName()).collect(Collectors.toList());

var duplicateNames = allNames
                .stream()
                .filter(i -> Collections.frequency(allNames, i) > 1)
                .collect(Collectors.toSet());

Can I improve/simplify the above solution?

For example, actually I create a list with all names and then filter it. How can I traverse the list to find its duplicate names without creating the additional list allNames?

Upvotes: 4

Views: 3883

Answers (3)

Holger
Holger

Reputation: 298409

One solution is

var duplicate = users.stream()
    .collect(Collectors.toMap(User::getName, u -> false, (x,y) -> true))
    .entrySet().stream()
    .filter(Map.Entry::getValue)
    .map(Map.Entry::getKey)
    .collect(Collectors.toSet());

This creates an intermediate Map<String,Boolean> to record which name is occurring more than once. You could use the keySet() of that map instead of collecting to a new Set:

var duplicate = users.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.toMap(User::getName, u -> false, (x,y) -> true, HashMap::new),
            m -> {
                m.values().removeIf(dup -> !dup);
                return m.keySet();
            }));

A loop solution can be much simpler:

HashSet<String> seen = new HashSet<>(), duplicate = new HashSet<>();
for(User u: users)
    if(!seen.add(u.getName())) duplicate.add(u.getName());

Upvotes: 7

nimo23
nimo23

Reputation: 5698

I find the solution with the help of @holger:

// collect all duplicate names with O(n)
var duplicateNames = all.stream()
                .collect(Collectors.groupingBy(Strategy::getName, Collectors.counting()))
                .entrySet()
                .stream()
                .filter(m -> m.getValue() > 1)
                .map(m -> m.getKey())
                .collect(Collectors.toList());

Is the performance of this solution O(n^2) or O(n)?

If someone can find improvements then please share.

Upvotes: 1

Andy Turner
Andy Turner

Reputation: 140494

Group by the names, find entries with more than one value:

Map<String, List<User>> grouped = users.stream()
    .collect(groupingBy(User::getName));

List<User> duplicated =
    grouped.values().stream()
        .filter(v -> v.size() > 1)
        .flatMap(List::stream)
        .collect(toList());

(You can do this in a single expression if you want. I only separated the steps to make it a little more clear what is happening).

Note that this does not preserve the order of the users from the original list.

Upvotes: 2

Related Questions