Reputation:
Is there a tool or library to find duplicate entries in a Collection according to specific criteria that can be implemented?
To make myself clear: I want to compare the entries to each other according to specific criteria. So I think a Predicate
returning just true
or false
isn't enough.
I can't use equals
.
Upvotes: 12
Views: 15702
Reputation: 590
Treeset allows you to do this easily:
Set uniqueItems = new TreeSet<>(yourComparator);
List<?> duplicates = objects.stream().filter(o -> !uniqueItems.add(o)).collect(Collectors.toList());
yourComarator
is used when calling uniqueItems.add(o)
, which adds the item to the set and returns true
if the item is unique. If the comparator considers the item a duplicate, add(o)
will return false.
Note that the item's equals
method must be consistent with yourComarator
as per the TreeSet documentation for this to work.
Upvotes: 0
Reputation: 1
Iterate the ArrayList
which contains duplicates and add them to the HashSet
. When the add method returns false in the HashSet
just log the duplicate to the console.
Upvotes: -2
Reputation:
I've created a new interface akin to the IEqualityComparer<T>
interface in .NET.
Such a EqualityComparator<T>
I then pass to the following method which detects duplicates.
public static <T> boolean hasDuplicates(Collection<T> collection,
EqualsComparator<T> equalsComparator) {
List<T> list = new ArrayList<>(collection);
for (int i = 0; i < list.size(); i++) {
T object1 = list.get(i);
for (int j = (i + 1); j < list.size(); j++) {
T object2 = list.get(j);
if (object1 == object2
|| equalsComparator.equals(object1, object2)) {
return true;
}
}
}
return false;
}
This way I can customise the comparison to my needs.
Upvotes: 2
Reputation: 86391
If you want to find duplicates, rather than just removing them, one approach would be to throw the Collection into an array, sort the array via a Comparator that implements your criteria, then linearly walk through the array, looking for adjacent duplicates.
Here's a sketch (not tested):
MyComparator myComparator = new MyComparator();
MyType[] myArray = myList.toArray();
Arrays.sort( myArray, myComparator );
for ( int i = 1; i < myArray.length; ++i ) {
if ( 0 == myComparator.compare( myArray[i - 1], myArray[i] )) {
// Found a duplicate!
}
}
Edit: From your comment, you just want to know if there are duplicates. The approach above works for this too. But you could more simply just create a java.util.SortedSet with a custom Comparator. Here's a sketch:
MyComparator myComparator = new MyComparator();
TreeSet treeSet = new TreeSet( myComparator );
treeSet.addAll( myCollection );
boolean containsDuplicates = (treeSet.size() != myCollection.size());
Upvotes: 4
Reputation: 726499
You can adapt a Java set to search for duplicates among objects of an arbitrary type: wrap your target class in a private wrapper that evaluates equality based on your criteria, and construct a set of wrappers.
Here is a somewhat lengthy example that illustrates the technique. It considers two people with the same first name to be equal, and so it detects three duplicates in the array of five objects.
import java.util.*;
import java.lang.*;
class Main {
static class Person {
private String first;
private String last;
public String getFirst() {return first;}
public String getLast() {return last;}
public Person(String f, String l) {
first = f;
last = l;
}
public String toString() {
return first+" "+last;
}
}
public static void main (String[] args) throws java.lang.Exception {
List<Person> people = new ArrayList<Person>();
people.add(new Person("John", "Smith"));
people.add(new Person("John", "Scott"));
people.add(new Person("Jack", "First"));
people.add(new Person("John", "Walker"));
people.add(new Person("Jack", "Black"));
Set<Object> seen = new HashSet<Object>();
for (Person p : people) {
final Person thisPerson = p;
class Wrap {
public int hashCode() { return thisPerson.getFirst().hashCode(); }
public boolean equals(Object o) {
Wrap other = (Wrap)o;
return other.wrapped().getFirst().equals(thisPerson.getFirst());
}
public Person wrapped() { return thisPerson; }
};
Wrap wrap = new Wrap();
if (seen.add(wrap)) {
System.out.println(p + " is new");
} else {
System.out.println(p + " is a duplicate");
}
}
}
}
You can play with this example on ideone [link].
Upvotes: 3
Reputation: 19848
It depends on the semantic of the criterion:
If your criterion is always the same for a given class, and is inherent to the underlying concept, you should just implement equals
and hashCode
and use a set.
If your criterion depend on the context, org.apache.commons.collections.CollectionUtils.select(java.util.Collection, org.apache.commons.collections.Predicate) might be the right solution for you.
Upvotes: 7
Reputation: 88707
You could use a map and while iterating over the collection put the elements into the map (the predicates would form the key) and if there's already an entry you've found a duplicate.
For more information see here: Finding duplicates in a collection
Upvotes: 2