Hai Nguyen
Hai Nguyen

Reputation: 33

How to find duplicate objects by multiple properties and merge them?

I’m writing a verify function for data normalize between collection in MongoDb I have an object: as below:

class ReleaseTime{
  private Date startDate;
  private Date endDate;
  private List<String> regions;
}

I have to gather all ReleaseTime object that has same startDate, and same endDate then merge the Regions list together

I have tried the code below but it just group by startDate

expectedAvailabilities = ungrouppedReleaseTime.stream()
            .collect(Collectors.toMap(ReleaseTime::getStartDate,
                    Function.identity(),
                    (ReleaseTime tb1, ReleaseTime tb2) ->
                    {
                        tb1.getRegions().addAll(tb2.getRegions());
                        tb2.getRegions().clear();
                        return tb1;
                    })
            ).values();

Thanks for your help!

Upvotes: 3

Views: 367

Answers (2)

fps
fps

Reputation: 34470

Here's an alternative way of doing what you want without using streams:

Map<List<Date>, List<String>> map = new LinkedHashMap<>();
ungrouppedAvailabilites.forEach(a ->
    map.computeIfAbsent(Arrays.asList(a.getStartDate(), a.getEndDate()), // or List.of
                        k -> new ArrayList<>())
       .addAll(a.getRegions()));

This uses Map.computeIfAbsent to group regions of ReleaseTime objects by start and end dates.

In case there are repeated regions among the grouped ReleaseTime objects and you don't want duplicates, you could use a Set instead of a List:

Map<List<Date>, Set<String>> map = new LinkedHashMap<>();
ungrouppedAvailabilites.forEach(a ->
    map.computeIfAbsent(Arrays.asList(a.getStartDate(), a.getEndDate()), // or List.of
                        k -> new LinkedHashSet<>())
       .addAll(a.getRegions()));

Note that I'm using LinkedHashMap and LinkedHashSet to keep elements in insertion order.


EDIT:

If you need ReleaseTime objects instead of only their regions, you could achieve it with one extra step:

Map<List<Date>, ReleaseTime> result = new LinkedHashMap<>();
map.forEach((k, v) -> 
    result.put(k, new ReleaseTime(k.get(0), k.get(1), new ArrayList<>(v))));

This assumes there's a constructor for ReleaseTime that receives all the attributes:

public ReleaseTime(Date startDate, Date endDate, List<String> regions) {
    this.startDate = startDate;
    this.endDate = endDate;
    this.regions = regions;
}

Upvotes: 2

Naman
Naman

Reputation: 31978

You can make use of the grouping as :

// Collection<ReleaseTime> ungrouppedAvailabilites...
Collection<ReleaseTime> mergedRegionsCollection = ungrouppedAvailabilites.stream()
        .collect(Collectors.toMap(t -> Arrays.asList(t.getStartDate(), t.getEndDate()),
                Function.identity(), ReleaseTime::mergeRegions))
        .values();

where mergeRegions is implemented as :

ReleaseTime mergeRegions(ReleaseTime that) {
    List<String> mergedRegions = this.getRegions();
    mergedRegions.addAll(that.getRegions());
    return new ReleaseTime(this.startDate, this.endDate, mergedRegions);
}

Note: To avoid mutating the existing objects, you can use the implementations as:

ReleaseTime mergeRegions(ReleaseTime that) {
    return new ReleaseTime(this.startDate, this.endDate,
            Stream.concat(this.getRegions().stream(), that.getRegions().stream())
                    .collect(Collectors.toList()));
}

Upvotes: 2

Related Questions