Reputation: 390
This code removes duplicates from the original list, but I want to extract the duplicates from the original list -> not removing them (this package name is just part of another project):
Given:
a Person pojo:
package at.mavila.learn.kafka.kafkaexercises;
import org.apache.commons.lang3.builder.ToStringBuilder;
public class Person {
private final Long id;
private final String firstName;
private final String secondName;
private Person(final Builder builder) {
this.id = builder.id;
this.firstName = builder.firstName;
this.secondName = builder.secondName;
}
public Long getId() {
return id;
}
public String getFirstName() {
return firstName;
}
public String getSecondName() {
return secondName;
}
public static class Builder {
private Long id;
private String firstName;
private String secondName;
public Builder id(final Long builder) {
this.id = builder;
return this;
}
public Builder firstName(final String first) {
this.firstName = first;
return this;
}
public Builder secondName(final String second) {
this.secondName = second;
return this;
}
public Person build() {
return new Person(this);
}
}
@Override
public String toString() {
return new ToStringBuilder(this)
.append("id", id)
.append("firstName", firstName)
.append("secondName", secondName)
.toString();
}
}
Duplication extraction code.
Notice here we filter the id and the first name to retrieve a new list, I saw this code someplace else, not mine:
package at.mavila.learn.kafka.kafkaexercises;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collectors;
import static java.util.Objects.isNull;
public final class DuplicatePersonFilter {
private DuplicatePersonFilter() {
//No instances of this class
}
public static List<Person> getDuplicates(final List<Person> personList) {
return personList
.stream()
.filter(duplicateByKey(Person::getId))
.filter(duplicateByKey(Person::getFirstName))
.collect(Collectors.toList());
}
private static <T> Predicate<T> duplicateByKey(final Function<? super T, Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> isNull(seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE));
}
}
The test code. If you run this test case you will get [alex, lolita, elpidio, romualdo].
I would expect to get instead [romualdo, otroRomualdo] as the extracted duplicates given the id and the firstName:
package at.mavila.learn.kafka.kafkaexercises;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
import static org.junit.Assert.*;
public class DuplicatePersonFilterTest {
private static final Logger LOGGER = LoggerFactory.getLogger(DuplicatePersonFilterTest.class);
@Test
public void testList(){
Person alex = new Person.Builder().id(1L).firstName("alex").secondName("salgado").build();
Person lolita = new Person.Builder().id(2L).firstName("lolita").secondName("llanero").build();
Person elpidio = new Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build();
Person romualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build();
Person otroRomualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("perez").build();
List<Person> personList = new ArrayList<>();
personList.add(alex);
personList.add(lolita);
personList.add(elpidio);
personList.add(romualdo);
personList.add(otroRomualdo);
final List<Person> duplicates = DuplicatePersonFilter.getDuplicates(personList);
LOGGER.info("Duplicates: {}",duplicates);
}
}
In my job I was able to get the desired result it by using Comparator using TreeMap and ArrayList, but this was creating a list then filtering it, passing the filter again to a newly created list, this looks bloated code, (and probably inefficient)
Does someone has a better idea how to extract duplicates?, not remove them.
Thanks in advance.
Update
Thanks everyone for your answers
To remove the duplicate using same approach with the uniqueAttributes:
public static List<Person> removeDuplicates(List<Person> personList) {
return getDuplicatesMap(personList).values().stream()
.filter(duplicates -> duplicates.size() > 1)
.flatMap(Collection::stream)
.collect(Collectors.toList());
}
private static Map<String, List<Person>> getDuplicatesMap(List<Person> personList) {
return personList.stream().collect(groupingBy(DuplicatePersonFilter::uniqueAttributes));
}
private static String uniqueAttributes(Person person){
if(Objects.isNull(person)){
return StringUtils.EMPTY;
}
return (person.getId()) + (person.getFirstName()) ;
}
Update 2
But also the answer provided by @brett-ryan is correct:
public static List<Person> extractDuplicatesWithIdentityCountingV2(final List<Person> personList){
List<Person> duplicates = personList.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.filter(n -> n.getValue() > 1)
.flatMap(n -> nCopies(n.getValue().intValue(), n.getKey()).stream())
.collect(toList());
return duplicates;
}
EDIT
Above code can be found under:
https://gitlab.com/totopoloco/marco_utilities/-/tree/master/duplicates_exercises
Please see:
Upvotes: 10
Views: 44510
Reputation: 28275
If you could implement equals
and hashCode
on Person
you could then use a counting down-stream collector of the groupingBy
to get distinct elements that have been duplicated.
List<Person> duplicates = personList.stream()
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.filter(n -> n.getValue() > 1)
.map(n -> n.getKey())
.collect(toList());
If you would like to keep a list of sequential repeated elements you can then expand this out using Collections.nCopies to expand it back out. This method will ensure repeated elements are ordered together.
List<Person> duplicates = personList.stream()
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.filter(n -> n.getValue() > 1)
.flatMap(n -> nCopies(n.getValue().intValue(), n.getKey()).stream())
.collect(toList());
Upvotes: 5
Reputation: 4156
Solution based on generic key:
public static <T> List<T> findDuplicates(List<T> list, Function<T, ?> uniqueKey) {
if (list == null) {
return emptyList();
}
Function<T, ?> notNullUniqueKey = el -> uniqueKey.apply(el) == null ? "" : uniqueKey.apply(el);
return list.stream()
.collect(groupingBy(notNullUniqueKey))
.values()
.stream()
.filter(matches -> matches.size() > 1)
.map(matches -> matches.get(0))
.collect(toList());
}
// Example of usage:
List<Person> duplicates = findDuplicates(list, el -> el.getFirstName());
Upvotes: 1
Reputation: 5968
I think first you should overwrite equals method of Person class and focus on id and name. And after you can update it adding a filter for that.
@Override
public int hashCode() {
return Objects.hash(id, name);
}
@Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final Person other = (Person) obj;
if (!Objects.equals(name, other.name)) {
return false;
}
if (!Objects.equals(id, other.id)) {
return false;
}
return true;
}
personList
.stream()
.filter(p -> personList.contains(p))
.collect(Collectors.toList());
Upvotes: 3
Reputation: 40078
In this scenario you need to write your custom logic to extract the duplicates from the list, you will get all the duplicates in the Person
list
public static List<Person> extractDuplicates(final List<Person> personList) {
return personList.stream().flatMap(i -> {
final AtomicInteger count = new AtomicInteger();
final List<Person> duplicatedPersons = new ArrayList<>();
personList.forEach(p -> {
if (p.getId().equals(i.getId()) && p.getFirstName().equals(i.getFirstName())) {
count.getAndIncrement();
}
if (count.get() == 2) {
duplicatedPersons.add(i);
}
});
return duplicatedPersons.stream();
}).collect(Collectors.toList());
}
Applied to:
List<Person> l = new ArrayList<>();
Person alex = new
Person.Builder().id(1L).firstName("alex").secondName("salgado").build();
Person lolita = new
Person.Builder().id(2L).firstName("lolita").secondName("llanero").build();
Person elpidio = new
Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build();
Person romualdo = new
Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build();
Person otroRomualdo = new
Person.Builder().id(4L).firstName("romualdo").secondName("perez").build();
l.add(alex);
l.add(lolita);
l.add(elpidio);
l.add(romualdo);
l.add(otroRomualdo);
Output:
[Person [id=4, firstName=romualdo, secondName=gomez], Person [id=4, firstName=romualdo, secondName=perez]]
Upvotes: 3
Reputation: 11988
To indentify duplicates, no method I know of is better suited than Collectors.groupingBy()
. This allows you to group the list into a map based on a condition of your choice.
Your condition is a combination of id
and firstName
. Let's extract this part into an own method in Person
:
String uniqueAttributes() {
return id + firstName;
}
The getDuplicates()
method is now quite straightforward:
public static List<Person> getDuplicates(final List<Person> personList) {
return getDuplicatesMap(personList).values().stream()
.filter(duplicates -> duplicates.size() > 1)
.flatMap(Collection::stream)
.collect(Collectors.toList());
}
private static Map<String, List<Person>> getDuplicatesMap(List<Person> personList) {
return personList.stream().collect(groupingBy(Person::uniqueAttributes));
}
getDuplicatesMap()
to create the map as explained above.flatMap()
is used to flatten the stream of lists into one single stream of persons, and collects the stream to a list.An alternative, if you truly identify persons as equal if the have the same id
and firstName
is to go with the solution by Jonathan Johx and implement an equals()
method.
Upvotes: 11
Reputation: 46
List<Person> arr = new ArrayList<>();
arr.add(alex);
arr.add(lolita);
arr.add(elpidio);
arr.add(romualdo);
arr.add(otroRomualdo);
Set<String> set = new HashSet<>();
List<Person> result = arr.stream()
.filter(data -> (set.add(data.name +";"+ Long.toString(data.id)) == false))
.collect(Collectors.toList());
arr.removeAll(result);
Set<String> set2 = new HashSet<>();
result.stream().forEach(data -> set2.add(data.name +";"+ Long.toString(data.id)));
List<Person> resultTwo = arr.stream()
.filter(data -> (set2.add(data.name +";"+ Long.toString(data.id)) == false))
.collect(Collectors.toList());
result.addAll(resultTwo);
The above code will filter based on name and id. The result List will have all the duplicated Person Object
Upvotes: 0
Reputation: 9415
List<Person> duplicates = personList.stream()
.collect(Collectors.groupingBy(Person::getId))
.entrySet().stream()
.filter(e->e.getValue().size() > 1)
.flatMap(e->e.getValue().stream())
.collect(Collectors.toList());
That should give you a List of Person
where the id
has been duplicated.
Upvotes: 4