Sam
Sam

Reputation: 1304

How to implement a key-value pair with variability in the key

I'm writing some code to de-duplicate data based on 2 fields:

  1. A string of characters, we'll call this the UMI
  2. An array of integers

I've created a POJO to hold this data and work as key for a TreeMap. The full set of data is held in the value - this way I only keep relevant data in memory.

However, the next requirement is to have variability in the UMI AND the integers. For example, the following two pieces of data would be considered duplicates based on the UMI having a variability(mismatch) of 1.

a. "AAA", [200,300]

b. "ABA", [200,300]

Similarly, the following would be considered duplicates based on the integer array, given a mismatch allowance of 2.

a. "AAA", [201,300]

b. "AAA", [203,300]

My current attempt has been to make this POJO implement the Comparable interface, and attempt to work the compareTo method to take into account the variability:

public class UMIPrimoKey implements Comparable<UMIPrimoKey> {

    private final String UMI;
    private final int[] ints;
    private final int umiMisMatch;
    private final int posMisMatch;

    public UMIPrimoKey(String UMI, int[] ints, int umiMisMatch, int posMisMatch) {
        this.UMI = UMI;
        this.ints = ints;
        this.umiMisMatch = umiMisMatch;
        this.posMisMatch = posMisMatch;
    }

    @Override
    public int compareTo(UMIPrimoKey o) {
        if (!Arrays.equals(ints, o.ints)) {
            if (ints.length == o.ints.length) {
                for (int i = 0; i < ints.length; i++) {
                    if (Math.abs(ints[i] - o.ints[i]) > posMisMatch) {
                        return -1;
                    }
                }
            } else {
                return -1;
            }
        }

        if (XsamStringUtils.numberOfDifferences(UMI, o.UMI) <= umiMisMatch) {
            return 0;
        }

        return 1;
    }
}

XsamStringUtils.numberOfDifferences is just a simple static method to count the number of differences between the two UMIs.

I return -1 if any two integers from the array have a difference greater than the allowed mismatches (posMisMatch). 0 is returned if the integers are allowed, and the number of mismatches in the UMI is less than the allowed amount, specified by umiMisMatch.

Otherwise, 1 is returned as the UMIs don't match.

I've then used this in a TreeMap which takes into account the compareTo method.

This works in my unit tests, with small numbers of UMIPrimoKeys added to it, but I'm getting some strange results when running the completed program. It's probably due to the rules for the method outlined here: https://docs.oracle.com/javase/8/docs/api/java/lang/Comparable.html but i'm finding it hard to adapt the code to take the rules into account.

Any direction is appreciated, thanks for reading!

Upvotes: 0

Views: 75

Answers (1)

aBnormaLz
aBnormaLz

Reputation: 847

According to the docs of compareTo:

The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y. (This implies that x.compareTo(y) must throw an exception iff y.compareTo(x) throws an exception.)

The implementor must also ensure that the relation is transitive: (x.compareTo(y)>0 && y.compareTo(z)>0) implies x.compareTo(z)>0.

Finally, the implementor must ensure that x.compareTo(y)==0 implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.

I think that's not true to your code, and that could cause problems with the get function not finding your entry

Upvotes: 2

Related Questions