Raj Raichand
Raj Raichand

Reputation: 97

How to remove duplicates based upon the string's combination?

I want to compare getA(eg: 123) & getB(eg: 456) and find duplicate records.

P1   getA           getB 
1    000123000      456      
P2   getA           getB 
2    000123001      456 

I tried below but it finds duplicates based on getA & getB combination:

Map<Object, Boolean> findDuplicates = productsList.stream().collect(Collectors.toMap(cm -> Arrays.asList(cm.getB(),cm.getA().substring(3, cm.getCode().length() - 3)), cm -> false, (a, b) -> true));

Now I am trying to remove the record which has cm.getA value as lowest but unable to use comapartor here:

productsList.removeIf(cm -> cm.getA() && findDuplicates .get(Arrays.asList(cm.getB(),cm.getA().substring(3, cm.getA().length() - 3))));

Any help would be appreciated?

Upvotes: 0

Views: 237

Answers (3)

Swapan Pramanick
Swapan Pramanick

Reputation: 175

Instead of Map of the duplicate key to boolean, you can use a map of duplicate key and TreeSet also. This will make it in one step. As in the TreeSet, the elements always remain sorted, you don't need to sort it in in the next step to find the minimum value.

public class ProductDups {

    public static void main(String[] args) {

        List<Product> productsList = new ArrayList<>();
        productsList.add(new Product("000123000", "456"));
        productsList.add(new Product("000123001", "456"));
        productsList.add(new Product("000124003", "567"));
        productsList.add(new Product("000124002", "567"));

        List<Product> minDuplicates = productsList.stream()
                .collect(
                        Collectors.toMap(
                                p -> Arrays.asList(p.getB(),
                                        p.getA().substring(3, p.getA().length() - 3)),
                                p -> {
                                    TreeSet<Product> ts = new TreeSet<>(Comparator.comparing(Product::getA));
                                    ts.addAll(Arrays.asList(p));
                                    return ts;
                                },
                                (a, b) -> {
                                    a.addAll(b);
                                    return a;
                                }
                        )
                )
                .entrySet()
                .stream()
                .filter(e -> e.getValue().size() > 1)
                .map(e -> e.getValue().first())
                .collect(Collectors.toList());
        System.out.println(minDuplicates);

    }
}
class Product {

    String a;
    String b;

    public Product(String a, String b) {
        this.a = a;
        this.b = b;
    }

    public String getA() {
        return a;
    }

    public void setA(String a) {
        this.a = a;
    }

    public String getB() {
        return b;
    }

    public void setB(String b) {
        this.b = b;
    }

    @Override
    public String toString() {
        return "Product{" +
                "a='" + a + '\'' +
                ", b='" + b + '\'' +
                '}';
    }
}

Upvotes: 0

Holger
Holger

Reputation: 298439

You can do it with two steps

Function<Product,Object> dupKey = cm ->
    Arrays.asList(cm.getB(), cm.getA().substring(3, cm.getA().length() - 3));

Map<Object, Boolean> duplicates = productsList.stream()
    .collect(Collectors.toMap(dupKey, cm -> false, (a, b) -> true));

Map<Object,Product> minDuplicates = productsList.stream()
    .filter(cm -> duplicates.get(dupKey.apply(cm)))
    .collect(Collectors.toMap(dupKey, Function.identity(),
        BinaryOperator.minBy(Comparator.comparing(Product::getA))));

productsList.removeAll(minDuplicates.values());

First, it identifies the keys which have duplicates, then, it collects the minimum for each key, skipping elements not having duplicates. Finally, remove the selected values.

In principle, this can be done in one step, but then, it requires an object holding both information, whether there were duplicates for a particular key and which has minimum value of them:

BinaryOperator<Product> min = BinaryOperator.minBy(Comparator.comparing(Product::getA));

Set<Product> minDuplicates = productsList.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.toMap(dupKey, cm -> Map.entry(false,cm),
            (a, b) -> Map.entry(true, min.apply(a.getValue(), b.getValue()))),
        m -> m.values().stream().filter(Map.Entry::getKey)
              .map(Map.Entry::getValue).collect(Collectors.toSet())));

productsList.removeAll(minDuplicates);

This uses Map.Entry instances to hold two values of different type. For keeping the code readable, it uses Java 9’s Map.entry(K,V) factory method. When support for Java 8 is required, it’s recommended to create your own factory method to keep the code simple:

static <K, V> Map.Entry<K, V> entry(K k, V v) {
    return new AbstractMap.SimpleImmutableEntry<>(k, v);
}

then use that method instead of Map.entry.

The logic stays the same as in the first variant, it maps values to false and the element itself and merges them to true and the minimum element, but now in one go. The filtering has to be done afterwards, to skip the false elements, then map to the minimum element and collect them into a Set.

Then, using removeAll is the same.

Upvotes: 1

D.Birkel
D.Birkel

Reputation: 3

You can make the string to an array list and the loop through the array list and compare it with the other array list if that is what you are trying to do.

Upvotes: 0

Related Questions