Gandalf StormCrow
Gandalf StormCrow

Reputation: 26202

Filter out items from Set in Java

I have a list of items(i.e Strings) which I need to sort/filter.

The end result should not contain any duplicate (easy), I'll put them all in the Set. So I have a Set of strings now.

more explanation..

I also have a method x that calculates the amount of difference between two Strings (using levenstein distance).

Question:

Before inserting new String string into my Set set I want to check for levenstein distance using method x between string and any other String in the set and if x returns >=3 than I should not add it.

What is my best shot of doing this? Except iterate trough set for each string to be inserted?

Upvotes: 5

Views: 4709

Answers (3)

Attila
Attila

Reputation: 28762

You can use a custom comparator when creating the set. In your comparator you return that two strings are the same if they are the same (as per regular string comparison rules) or if their Levenstein distance fulfills your criteria.

When your comaprator says two strings are the same, the new string is not inserted into the set. (Note that this means the end result of the string might depend on the order of insertion)

Update: Addressing comments about total ordering:

Using a comparator like the one suggested above would make the endresult dependent on the order of insertion (as noted above), as would any other solution as the Levenstein distance criteria used does not define total ordering.

OTOH, once a string passes the not-equal test and is inserted into the set, no other string in the set will compare equal to this one, so the strings in the set will use their natural string ordering, which does define total ordering, so no further inconsistencies arise within the set's internal operations (e.g. sorting).

Upvotes: 1

Edwin Dalorzo
Edwin Dalorzo

Reputation: 78579

I have played with my idea of how to do it. I cannot think of a way to do this without any amount of iteration.

Supposing you have method named distance(String,String):int that returns the given distance between two Strings.

String x = "Obi-wan"; //this is the item subject to eval addition
List<String> items = new ArrayList<String>(asList("Luke","Yoda","Anakin"));
if (items.filter(s -> distance(s, x) >= 3).getFirst() == null) {
  items.add(x);
}

If you use JDK8 Preview you can do this in no time using exactly the code above. The Iterables.getFirst() method would not iterate the whole collection, but only until the first element satisfying the criteria is found.

Otherwise you will probably have to implement a Predicate interface and filtering method.

interface Predicate<T> {
    public boolean eval(T o);
}

public static void main(String[] args) {
   final String x = "Obi-wan"; //this is the item subject to eval addition
   List<String> items = new ArrayList<String>(asList("Luke","Yoda","Anakin"));
   Predicate<String> p = new Predicate<String>() {
       public boolean eval(String s){ 
           return distance(s, x) >= 3;
       }
   };
   if(filter(items, p).isEmpty()){ 
        items.add(x);
   }
}

public static <T> List<T> filter(List<? extends T> items, Predicate<? super T> predicate){
    List<T> destiny = new ArrayList<T>();
    for(T item : items){
       if(predicate.eval(item){
           destiny.add(item);
       }
    }
    return destiny;
}

Alternatively, you could stop filtering once you find the first item that satisfies your criteria.

Upvotes: 2

Louis Wasserman
Louis Wasserman

Reputation: 198043

Iterating through the Set is going to be your best bet, since there isn't any built-in Set implementation that would help you narrow the possibilities.

Upvotes: 2

Related Questions