Dragon
Dragon

Reputation: 2481

Java: Filter collection and retrieve data by multiple fields

I have a class:

public class Address {
    private String country;
    private String state;
    private String city;
}

And there are a list of Person objects. Person class looks like:

public class Person {
    private String country;
    private String state;
    private String city;
    //other fields
}

I need to filter Person objects and get the most suitable one. Address object can have at least one not null field. Person object can have none, partially or all mentioned fields initialized.

Here is one of the possible input examples:

Three Person objects:
a. PersonA: country = 'A'
b. PersonB: country = 'A', state = 'B'
c. PersonC: country = 'A', state = 'B', city = 'C'

Address object:
a. Address: country = 'A', state = 'B'

Expected result after filtering is PersonB. And in case when there're only PersonA and PersonC objects, then PersonA is more preferable.

I'd like to show how I tried to do this, but in fact it is pure brute force algorithm and I dislike it. Algorithm complexity increases with newly added field. I also thought about using guava filter by predicate, but had no idea what predicate should be.

What is preferable algorithm for such filtering if there's any besides brute force?

Upvotes: 1

Views: 982

Answers (2)

bashnesnos
bashnesnos

Reputation: 816

To avoid brute-force you would need to index you persons by address. For a good search you would definitely need a country (guess it or default it somehow, otherwise results would anyway be too inaccurate).

The index would be a number, first 3 digits for a country, next 3 digits for a state and last 4 digits for a city. In such a case in an int you would be able to store 213 countries (only 206 as of 2016), with up to 999 states and 9999 cities.

It gives us ability to use hashCode and a TreeSet to index your Person instances and look them up partially by address in a O(log(n)) manner without touching their fields. Fields would be touched on a TreeSet construction, and you would need to add some extra logic for a modification of a Person to keep the index intact.

The index is calculated sequntially for each part, starting from country

    import java.util.HashMap;
    import java.util.Map;

    public class PartialAddressSearch {

        private final static Map<String, AddressPartHolder> COUNTRY_MAP = new HashMap<>(200);

        private static class AddressPartHolder {
            int id;
            Map<String, AddressPartHolder> subPartMap;

            public AddressPartHolder(int id, Map<String, AddressPartHolder> subPartMap) {
                this.id = id;
                this.subPartMap = subPartMap;
            }
        }

        public static int getCountryStateCityHashCode(String country, String state, String city) {
            if (country != null && country.length() != 0) {
                int result = 0;
                AddressPartHolder countryHolder = COUNTRY_MAP.get(country);
                if (countryHolder == null) {
                    countryHolder = new AddressPartHolder(COUNTRY_MAP.size() + 1, new HashMap<>());
                    COUNTRY_MAP.put(country, countryHolder);
                }
                result += countryHolder.id * 10000000;

                if (state != null) {
                    AddressPartHolder stateHolder = countryHolder.subPartMap.get(state);
                    if (stateHolder == null) {
                        stateHolder = new AddressPartHolder(countryHolder.subPartMap.size() + 1, new HashMap<>());
                        countryHolder.subPartMap.put(state, stateHolder);
                    }
                    result += stateHolder.id * 10000;

                    if (city != null && city.length() != 0) {
                        AddressPartHolder cityHolder = stateHolder.subPartMap.get(city);
                        if (cityHolder == null) {
                            cityHolder = new AddressPartHolder(stateHolder.subPartMap.size() + 1, null);
                            stateHolder.subPartMap.put(city, cityHolder);
                        }
                        result += cityHolder.id;
                    }
                }

                return result;
            } else {
                throw new IllegalArgumentException("Non-empty country is expected");
            }
    }

For your Person and Address classes you define hashCode and compareTo basing on a natural order of int:

    public class Person implements Comparable {
        private String country;
        private String state;
        private String city;

        @Override
        public boolean equals(Object o) {
             //it's important but I removed it for readability
        }

        @Override
        public int hashCode() {
            return getCountryStateCityHashCode(country, state, city);
        }

        @Override
        public int compareTo(Object o) {
            //could be further improved by storing hashcode in a field to avoid re-calculation on sorting
            return hashCode() - o.hashCode();
        }

    }

    public class Address implements Comparable {
        private String country;
        private String state;
        private String city;


        @Override
        public boolean equals(Object o) {
             //removed for readability
        }

        @Override
        public int hashCode() {
            return getCountryStateCityHashCode(country, state, city);
        }

        @Override
        public int compareTo(Object o) {
            //could be further improved by storing hashcode in a field to avoid re-calculation on sorting
            return hashCode() - o.hashCode();
        }

    }

    public class AddressPersonAdapter extends Person {
        private final Address delegate;

        public AddressPersonAdapter(Address delegate) {
            this.delegate = delegate;
        }

        @Override
        public boolean equals(Object o) {
            return delegate.equals(o);
        }

        @Override
        public int hashCode() {
            return delegate.hashCode();
        }
    }

After that your filtering code would shrink to filling the index and calculating floor for your partial Address:

    TreeSet<Person> personSetByAddress = new TreeSet<>();
    Person personA = new Person();
    personA.setCountry("A");
    personSetByAddress.add(personA);
    Person personB = new Person();
    personB.setCountry("A");
    personB.setState("B");
    personSetByAddress.add(personB);
    Person personC = new Person();
    personC.setCountry("A");
    personC.setState("B");
    personC.setCity("C");
    personSetByAddress.add(personC);

    Address addressAB = new Address();
    addressAB.setCountry("A");
    addressAB.setState("B");

    System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));

    Yields:
    Person{hashCode=10010000, country='A', state='B', city='null'}

And if you won't have PersonB:

    TreeSet<Person> personSetByAddress = new TreeSet<>();
    Person personA = new Person();
    personA.setCountry("A");
    personSetByAddress.add(personA);
    Person personC = new Person();
    personC.setCountry("A");
    personC.setState("B");
    personC.setCity("C");
    personSetByAddress.add(personC);

    Address addressAB = new Address();
    addressAB.setCountry("A");
    addressAB.setState("B");

    System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));

    Yields:
    Person{hashCode=10000000, country='A', state='null', city='null'}

EDIT:

Corner-case which will require additional validation would be absence of a greater(or lesser if we need ceiling) element within the same country. E.g.:

    TreeSet<Person> personSetByAddress = new TreeSet<>();
    Person personA = new Person();
    personA.setCountry("D");
    personSetByAddress.add(personA);
    Person personC = new Person();
    personC.setCountry("A");
    personC.setState("B");
    personC.setCity("C");
    personSetByAddress.add(personC);

    Address addressAB = new Address();
    addressAB.setCountry("A");
    addressAB.setState("B");

    System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));

    Yields:
    Person{hashCode=10000000, country='D', state='null', city='null'}

I.e. we fall out to the closest country. To fix this, we would need to check that the country digit is still the same. We can do it by sub-classing the TreeSet and adding this check there:

 //we need this class to allow flooring just by id
 public class IntegerPersonAdapter extends Person {
    private Integer id;
    public IntegerPersonAdapter(Integer id) {
        this.id = id;
    }

    @Override
    public boolean equals(Object o) {
        return id.equals(o);
    }

    @Override
    public int hashCode() {
        return id.hashCode();
    }

    @Override
    public int compareTo(Object o) {
        return id.hashCode() - o.hashCode();
    }

    @Override
    public String toString() {
        return id.toString();
    }
}

public class StrictCountryTreeSet extends TreeSet<Person> {

    @Override
    public Person floor(Person e) {
        Person candidate = super.floor(e);
        if (candidate != null) {
            //we check if the country is the same
            int candidateCode = candidate.hashCode();
            int eCode = e.hashCode();
            if (candidateCode == eCode) {
                return candidate;
            } else {
                int countryCandidate = candidateCode / 10000000;
                if (countryCandidate == (eCode / 10000000)) {
                    //we check if the state is the same
                    int stateCandidate = candidateCode / 10000;
                    if (stateCandidate == (eCode / 10000)) {
                        //we check if is a state
                        if (candidateCode % 10 == 0) {
                            return candidate;
                        } else { //since it's not exact match we haven't found a city - we need to get someone just from state
                            return this.floor(new IntegerPersonAdapter(stateCandidate * 10000));
                        }

                    } else if (stateCandidate % 10 == 0) { //we check if it's a country already
                        return candidate;
                    } else {
                        return this.floor(new IntegerPersonAdapter(countryCandidate * 10000000));
                    }
                }
            }
        }
        return null;
    }

Now our test case would yield null after we intialize a StrictCountryTreeSet:

    TreeSet<Person> personSetByAddress = new StrictCountryTreeSet();
    Person personA = new Person();
    personA.setCountry("D");
    personSetByAddress.add(personA);
    Person personC = new Person();
    personC.setCountry("A");
    personC.setState("B");
    personC.setCity("C");
    personSetByAddress.add(personC);

    Address addressAB = new Address();
    addressAB.setCountry("A");
    addressAB.setState("B");

    System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));

    Yields:
    null

And a test for a different state would yield null as well:

    TreeSet<Person> personSetByAddress = new StrictCountryTreeSet();
    Person personD = new Person();
    personD.setCountry("D");
    personSetByAddress.add(personD);

    Person personE = new Person();
    personE.setCountry("A");
    personE.setState("E");
    personSetByAddress.add(personE);

    Person personC = new Person();
    personC.setCountry("A");
    personC.setState("B");
    personC.setCity("C");
    personSetByAddress.add(personC);

    Address addressA = new Address();
    addressA.setCountry("A");

    Address addressAB = new Address();
    addressAB.setCountry("A");
    addressAB.setState("B");

    Address addressABC = new Address();
    addressABC.setCountry("A");
    addressABC.setState("B");
    addressABC.setCity("C");

    System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));

    Yields:
    null

Please note, that in this case you would need to store hashCode results within the Address and Person clasess to avoid it re-calculation.

Upvotes: 1

xenteros
xenteros

Reputation: 15852

As I understand, by brute-force you mean checking all fields of all entities. Well, if you won't refactor your classes, it's not possible, but there is a simple trick which will help. It uses state pattern.

You can add flag notNulls to both classes:

public class Address {
    private int notNulls = 0;
    private String country;
    private String state;
    private String city;
}

public class Person {
    private int notNulls = 0;
    private String country;
    private String state;
    private String city;
    //other fields
}

I'll show you possible implementation of a one setter as the rest is similar:

public void setCountry(String s) {
    if (country == null {
        if (s != null) {
            country = s;
            notNulls++;
        }
    } else {
        if (s == null) {
            country == null;
            notNulls--;
        } else {
            country = s;
        }
    }
}

public boolean isValid() {
    return notNulls != 0;
}

Now you can simply loop through the objects.

Upvotes: 1

Related Questions