gansvv
gansvv

Reputation: 21

Remove duplicates based on a few object properties from list

I have a List collection where each Metric contains several properties such as: metricName, namespace, fleet, type, component, firstSeenTime, lastSeenTime, etc. There are duplicates in this list such that all properties are same except for firstSeenTime and lastSeenTime. I am looking for an elegant way to filter this list and only return the metrics with the most recent lastSeenTime when there are such duplicates.

Something better than this:

private List<Metric> processResults(List<Metric metrics) {
    List<Metric> results = new ArrayList<>();

    for (Metric incomingMetric: metrics) {

        // We need to implement "contains" below so that only properties
        // other than the two dates are checked.
        if (results.contains(incomingMetric) { 
            int index = results.indexOf(incomingMetric);
            Metric existing = results.get(index); 
            if (incomingMetric.getLastSeen().after(existing.getLastSeen())) {
                results.set(index, metricName);
            } else {
                // do nothing, metric in results is already the latest 
            }
        } else {
            // add incomingMetric to results for the first time
            results.add(incomingMetric);
        }
    }

    return results;
}

The results.contains check is done by iterating over all the Metrics in results and checking if each object matches the properties except for the two dates.

What could be a better approach that this for both elegance and performance?

Upvotes: 0

Views: 679

Answers (3)

gansvv
gansvv

Reputation: 21

Thanks for the answers. I went with the map approach since it does not incur additional sorts and copies.

@VisibleForTesting
Set<Metric> removeDuplicates(List<Metric> metrics) {

Map<RawMetric, Metric> metricsMap = new HashMap<>();
for (Metric metric : metrics) {
    RawMetric rawMetric = RawMetric.builder()
            .metricName(metric.getName())
            .metricType(metricName.getMetricType())
            ... // and more
            .build();

        // pick the latest updated metric (based on lastSeen date)
        BiFunction<RawMetric, Metric, Metric> biFunction =
            (k, v) -> Metric.builder()
                    .name(k.getMetricName())
                    .metricType(k.getMetricType())
                    ... // and more                        
                    .lastSeen(v.getLastSeen().after(
                        metricName.getLastSeen()) ? v.getLastSeen() : 
                            metricName.getLastSeen())
                    .firstSeen(v.getFirstSeen())
                    .build();

        metricsMap.putIfAbsent(rawMetric, metric);
        metricsMap.computeIfPresent(rawMetric, biFunction);
    }

    return ImmutableSet.copyOf(metricsMap.values());
}

@Value
@Builder
static class RawMetricName {
    private String metricName;
    private String metricType;
    private String ad;
    private String project;
    private String fleet;
    private String host;
    private int granularity;
}

Upvotes: 0

Neero
Neero

Reputation: 226

I’m not sure how you are generating List<Metric>. But if you can maintain a Map<String, Metric> instead of that list you may can try the below approach.

So the key of this map will be a combination of all these values you need to compare. (except the date attributes.)

Key: “{metricName}${type}$.....”

For this you can maintain another attribute in Metric object with getter. When you call the getter it will return the key.

Then check the key is exist or not before you put into the map. If it’s exist, get the stored Metric in map for that key and do the date comparison to find the latest Metric object. If it’s the latest replace the map's stored object with new object.

PS : Do the execution time comparison for both cases. So you will find the best approach.

Upvotes: 1

Duloren
Duloren

Reputation: 2711

In java the most elegant way to compare things is the Comparator interface. You should remove the duplicates using something like:

public List<Metric> removeDuplicates(List<Metric> metrics) {

    List<Metric> copy = new ArrayList<>(metrics);
    //first sort the metrics list from most recent to older
    Collections.sort(copy, new SortComparator());

    Set<Metric> set = new TreeSet<Metric>(new Comparator<Metric>() {

        @Override
        public int compare(Metric o1, Metric o2) {
            int result = 0;
            // compare the two metrics given your rules
            return result;
        }
    });

    for(Metric metric : copy) {
        set.add(metric);
    }

    List<Metric> result = Arrays.asList(set.toArray());
    return result;
 }

class SortComparator implements Comparator<Metric> {

    @Override
    public int compare(Metric o1, Metric o2) {
        int result = 0;
        if(o2.getLastSeenTime() != null && o1.getLastSeenTime() != null) {
            result = o2.getLastSeenTime().compareTo(o1.getLastSeenTime());
        }
        return result;
    }

}

The strong of this approach is that you could write a family of comparators and provide a Factory to choose at runtime the best way to compare your metrics and remove or not instances as duplicates among the runtime conditions:

public void removeDuplicates(List<Metric> metrics, Comparator<Metric> comparator) {

    List<Metric> copy = new ArrayList<>(metrics);
    Collections.sort(copy, new SortComparator());

    Set<Metric> set = new TreeSet<Metric>(comparator);
    for(Metric metric : copy) {
        set.add(metric);
    }
    List<Object> result = Arrays.asList(set.toArray());
    return result;
 }

Upvotes: 1

Related Questions