Jake
Jake

Reputation: 421

Detect duplicate by multiple properties of an entity using streams

I want to detect any duplicate object by its internal multiple properties, taken example:

class Person{
    String name,
    Integer age,
    String address
    
    //constructors
    //getters setters
}

Now, out of above 3 parameters, I want to check duplication using 2 params, those are {name and age} I tried to achieve this using streams but seems there should be even simpler way using stream.

Current approach:

List<Person> personList = new ArrayList<>();
personList.add(new Person("name1", 10, "address1"));
personList.add(new Person("name2", 20, "address1"));
personList.add(new Person("name1", 10, "address2"));
personList.add(new Person("name3", 10, "address2"));

// Want to detect name1 and age 10 as a duplicate entry

Map<String, Map<Integer, List<Person>> nameAgePersonListMap = personList.stream()
          .collect(Collectors.groupingBy(i -> i.getName(), Collectors.groupingBy(i -> i.getAge())));
// and later checking each element for size() > 1

Is there a further efficient way to determine duplicates in this use-case?

Upvotes: 4

Views: 2782

Answers (2)

Nowhere Man
Nowhere Man

Reputation: 19565

You can detect duplicates by providing a composite key, collecting into list and the filtering by the size of this list.

The composite key may be created using various options which are more or less equivalent:

  • String concatenation: p -> p.getName() + "|" + p.getAge()
  • as a separate wrapper class suggested earlier: p -> new PairKey(p.getName(), p.getAge())

Specifically in this case a AbstractMap.SimpleImmutableEntry<> may be used as we have two parameters to build the key:

List<List<Person>> duplicateByNameAge = personList.stream()
                .collect(Collectors.groupingBy(
                    p -> new AbstractMap.SimpleImmutableEntry<>(p.getName(), p.getAge()), 
                    Collectors.toList()))
                .entrySet().stream().filter(e -> e.getValue().size() > 1)
                .map(Map.Entry::getValue)
                .collect(Collectors.toList());

System.out.println(duplicateByNameAge);

Output:

[[Person(name=name1, age=10, address=address1), Person(name=name1, age=10, address=address2)]]

Upvotes: 2

Naman
Naman

Reputation: 31988

If the keys are not much of importance after this grouping and identifying the duplicates, you can perform the same as :

Map<List<?>, List<Person>> nameAgePersonListMap = personList.stream()
        .collect(Collectors.groupingBy(i -> Arrays.asList(i.getName(), i.getAge())));

I said "much", because you can access the keys still, just that the specific attribute would have to be cast into its type to retrieve it, for example entry -> (String)entry.getKey().get(0) would give back the name used in the grouping.

So specifically useful when performing something like

personList.stream()
        .collect(Collectors.groupingBy(i -> Arrays.asList(i.getName(), i.getAge())))
        .entrySet().stream()
        .filter(entry -> entry.getValue().size() > 1)
        .map(Map.Entry::getValue)
        ...

Upvotes: 2

Related Questions