user321068
user321068

Reputation:

Finding duplicate entries in Collection

Is there a tool or library to find duplicate entries in a Collection according to specific criteria that can be implemented?


To make myself clear: I want to compare the entries to each other according to specific criteria. So I think a Predicate returning just true or false isn't enough.


I can't use equals.

Upvotes: 12

Views: 15702

Answers (7)

Tadhg
Tadhg

Reputation: 590

Treeset allows you to do this easily:

Set uniqueItems = new TreeSet<>(yourComparator);
List<?> duplicates = objects.stream().filter(o -> !uniqueItems.add(o)).collect(Collectors.toList());

yourComarator is used when calling uniqueItems.add(o), which adds the item to the set and returns true if the item is unique. If the comparator considers the item a duplicate, add(o) will return false.

Note that the item's equals method must be consistent with yourComarator as per the TreeSet documentation for this to work.

Upvotes: 0

Nagendra
Nagendra

Reputation: 1

Iterate the ArrayList which contains duplicates and add them to the HashSet. When the add method returns false in the HashSet just log the duplicate to the console.

Upvotes: -2

user321068
user321068

Reputation:

I've created a new interface akin to the IEqualityComparer<T> interface in .NET.

Such a EqualityComparator<T> I then pass to the following method which detects duplicates.

public static <T> boolean hasDuplicates(Collection<T> collection,
        EqualsComparator<T> equalsComparator) {
    List<T> list = new ArrayList<>(collection);
    for (int i = 0; i < list.size(); i++) {
        T object1 = list.get(i);
        for (int j = (i + 1); j < list.size(); j++) {
            T object2 = list.get(j);
            if (object1 == object2
                    || equalsComparator.equals(object1, object2)) {
                return true;
            }
        }
    }
    return false;
}

This way I can customise the comparison to my needs.

Upvotes: 2

Andy Thomas
Andy Thomas

Reputation: 86391

If you want to find duplicates, rather than just removing them, one approach would be to throw the Collection into an array, sort the array via a Comparator that implements your criteria, then linearly walk through the array, looking for adjacent duplicates.

Here's a sketch (not tested):

   MyComparator myComparator = new MyComparator();
   MyType[] myArray = myList.toArray();
   Arrays.sort( myArray, myComparator );
   for ( int i = 1; i < myArray.length; ++i ) {
      if ( 0 == myComparator.compare( myArray[i - 1], myArray[i] )) {
         // Found a duplicate!
      }
   }

Edit: From your comment, you just want to know if there are duplicates. The approach above works for this too. But you could more simply just create a java.util.SortedSet with a custom Comparator. Here's a sketch:

   MyComparator myComparator = new MyComparator();
   TreeSet treeSet = new TreeSet( myComparator );
   treeSet.addAll( myCollection );
   boolean containsDuplicates = (treeSet.size() != myCollection.size()); 

Upvotes: 4

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726499

You can adapt a Java set to search for duplicates among objects of an arbitrary type: wrap your target class in a private wrapper that evaluates equality based on your criteria, and construct a set of wrappers.

Here is a somewhat lengthy example that illustrates the technique. It considers two people with the same first name to be equal, and so it detects three duplicates in the array of five objects.

import java.util.*;
import java.lang.*;

class Main {
    static class Person {
        private String first;
        private String last;
        public String getFirst() {return first;}
        public String getLast() {return last;}
        public Person(String f, String l) {
            first = f;
            last = l;
        }
        public String toString() {
            return first+" "+last;
        }
    }
    public static void main (String[] args) throws java.lang.Exception {
        List<Person> people = new ArrayList<Person>();
        people.add(new Person("John", "Smith"));
        people.add(new Person("John", "Scott"));
        people.add(new Person("Jack", "First"));
        people.add(new Person("John", "Walker"));
        people.add(new Person("Jack", "Black"));
        Set<Object> seen = new HashSet<Object>();
        for (Person p : people) {
            final Person thisPerson = p;
            class Wrap {
                public int hashCode() { return thisPerson.getFirst().hashCode(); }
                public boolean equals(Object o) {
                    Wrap other = (Wrap)o;
                    return other.wrapped().getFirst().equals(thisPerson.getFirst());
                }
                public Person wrapped() { return thisPerson; }
            };
            Wrap wrap = new Wrap();
            if (seen.add(wrap)) {
                System.out.println(p + " is new");
            } else {
                System.out.println(p + " is a duplicate");
            }
        }
    }
}

You can play with this example on ideone [link].

Upvotes: 3

Samuel Rossille
Samuel Rossille

Reputation: 19848

It depends on the semantic of the criterion:

If your criterion is always the same for a given class, and is inherent to the underlying concept, you should just implement equals and hashCode and use a set.

If your criterion depend on the context, org.apache.commons.collections.CollectionUtils.select(java.util.Collection, org.apache.commons.collections.Predicate) might be the right solution for you.

Upvotes: 7

Thomas
Thomas

Reputation: 88707

You could use a map and while iterating over the collection put the elements into the map (the predicates would form the key) and if there's already an entry you've found a duplicate.

For more information see here: Finding duplicates in a collection

Upvotes: 2

Related Questions