Reputation: 1304
I'm writing some code to de-duplicate data based on 2 fields:
I've created a POJO to hold this data and work as key for a TreeMap
. The full set of data is held in the value - this way I only keep relevant data in memory.
However, the next requirement is to have variability in the UMI AND the integers. For example, the following two pieces of data would be considered duplicates based on the UMI having a variability(mismatch) of 1.
a. "AAA", [200,300]
b. "ABA", [200,300]
Similarly, the following would be considered duplicates based on the integer array, given a mismatch allowance of 2.
a. "AAA", [201,300]
b. "AAA", [203,300]
My current attempt has been to make this POJO implement the Comparable
interface, and attempt to work the compareTo
method to take into account the variability:
public class UMIPrimoKey implements Comparable<UMIPrimoKey> {
private final String UMI;
private final int[] ints;
private final int umiMisMatch;
private final int posMisMatch;
public UMIPrimoKey(String UMI, int[] ints, int umiMisMatch, int posMisMatch) {
this.UMI = UMI;
this.ints = ints;
this.umiMisMatch = umiMisMatch;
this.posMisMatch = posMisMatch;
}
@Override
public int compareTo(UMIPrimoKey o) {
if (!Arrays.equals(ints, o.ints)) {
if (ints.length == o.ints.length) {
for (int i = 0; i < ints.length; i++) {
if (Math.abs(ints[i] - o.ints[i]) > posMisMatch) {
return -1;
}
}
} else {
return -1;
}
}
if (XsamStringUtils.numberOfDifferences(UMI, o.UMI) <= umiMisMatch) {
return 0;
}
return 1;
}
}
XsamStringUtils.numberOfDifferences
is just a simple static method to count the number of differences between the two UMIs.
I return -1 if any two integers from the array have a difference greater than the allowed mismatches (posMisMatch
). 0 is returned if the integers are allowed, and the number of mismatches in the UMI is less than the allowed amount, specified by umiMisMatch
.
Otherwise, 1 is returned as the UMIs don't match.
I've then used this in a TreeMap
which takes into account the compareTo
method.
This works in my unit tests, with small numbers of UMIPrimoKey
s added to it, but I'm getting some strange results when running the completed program. It's probably due to the rules for the method outlined here: https://docs.oracle.com/javase/8/docs/api/java/lang/Comparable.html but i'm finding it hard to adapt the code to take the rules into account.
Any direction is appreciated, thanks for reading!
Upvotes: 0
Views: 75
Reputation: 847
According to the docs of compareTo:
The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y. (This implies that x.compareTo(y) must throw an exception iff y.compareTo(x) throws an exception.)
The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.
Finally, the implementor must ensure that x.compareTo(y)==0 implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.
I think that's not true to your code, and that could cause problems with the get function not finding your entry
Upvotes: 2