Reputation: 21
I have a List collection where each Metric contains several properties such as: metricName, namespace, fleet, type, component, firstSeenTime, lastSeenTime, etc. There are duplicates in this list such that all properties are same except for firstSeenTime and lastSeenTime. I am looking for an elegant way to filter this list and only return the metrics with the most recent lastSeenTime when there are such duplicates.
Something better than this:
private List<Metric> processResults(List<Metric metrics) {
List<Metric> results = new ArrayList<>();
for (Metric incomingMetric: metrics) {
// We need to implement "contains" below so that only properties
// other than the two dates are checked.
if (results.contains(incomingMetric) {
int index = results.indexOf(incomingMetric);
Metric existing = results.get(index);
if (incomingMetric.getLastSeen().after(existing.getLastSeen())) {
results.set(index, metricName);
} else {
// do nothing, metric in results is already the latest
}
} else {
// add incomingMetric to results for the first time
results.add(incomingMetric);
}
}
return results;
}
The results.contains check is done by iterating over all the Metrics in results and checking if each object matches the properties except for the two dates.
What could be a better approach that this for both elegance and performance?
Upvotes: 0
Views: 679
Reputation: 21
Thanks for the answers. I went with the map approach since it does not incur additional sorts and copies.
@VisibleForTesting
Set<Metric> removeDuplicates(List<Metric> metrics) {
Map<RawMetric, Metric> metricsMap = new HashMap<>();
for (Metric metric : metrics) {
RawMetric rawMetric = RawMetric.builder()
.metricName(metric.getName())
.metricType(metricName.getMetricType())
... // and more
.build();
// pick the latest updated metric (based on lastSeen date)
BiFunction<RawMetric, Metric, Metric> biFunction =
(k, v) -> Metric.builder()
.name(k.getMetricName())
.metricType(k.getMetricType())
... // and more
.lastSeen(v.getLastSeen().after(
metricName.getLastSeen()) ? v.getLastSeen() :
metricName.getLastSeen())
.firstSeen(v.getFirstSeen())
.build();
metricsMap.putIfAbsent(rawMetric, metric);
metricsMap.computeIfPresent(rawMetric, biFunction);
}
return ImmutableSet.copyOf(metricsMap.values());
}
@Value
@Builder
static class RawMetricName {
private String metricName;
private String metricType;
private String ad;
private String project;
private String fleet;
private String host;
private int granularity;
}
Upvotes: 0
Reputation: 226
I’m not sure how you are generating List<Metric>
. But if you can maintain a Map<String, Metric>
instead of that list you may can try the below approach.
So the key of this map will be a combination of all these values you need to compare. (except the date attributes.)
Key: “{metricName}${type}$.....”
For this you can maintain another attribute in Metric object with getter. When you call the getter it will return the key.
Then check the key is exist or not before you put into the map. If it’s exist, get the stored Metric in map for that key and do the date comparison to find the latest Metric object. If it’s the latest replace the map's stored object with new object.
PS : Do the execution time comparison for both cases. So you will find the best approach.
Upvotes: 1
Reputation: 2711
In java the most elegant way to compare things is the Comparator interface. You should remove the duplicates using something like:
public List<Metric> removeDuplicates(List<Metric> metrics) {
List<Metric> copy = new ArrayList<>(metrics);
//first sort the metrics list from most recent to older
Collections.sort(copy, new SortComparator());
Set<Metric> set = new TreeSet<Metric>(new Comparator<Metric>() {
@Override
public int compare(Metric o1, Metric o2) {
int result = 0;
// compare the two metrics given your rules
return result;
}
});
for(Metric metric : copy) {
set.add(metric);
}
List<Metric> result = Arrays.asList(set.toArray());
return result;
}
class SortComparator implements Comparator<Metric> {
@Override
public int compare(Metric o1, Metric o2) {
int result = 0;
if(o2.getLastSeenTime() != null && o1.getLastSeenTime() != null) {
result = o2.getLastSeenTime().compareTo(o1.getLastSeenTime());
}
return result;
}
}
The strong of this approach is that you could write a family of comparators and provide a Factory to choose at runtime the best way to compare your metrics and remove or not instances as duplicates among the runtime conditions:
public void removeDuplicates(List<Metric> metrics, Comparator<Metric> comparator) {
List<Metric> copy = new ArrayList<>(metrics);
Collections.sort(copy, new SortComparator());
Set<Metric> set = new TreeSet<Metric>(comparator);
for(Metric metric : copy) {
set.add(metric);
}
List<Object> result = Arrays.asList(set.toArray());
return result;
}
Upvotes: 1