Reputation: 31020

remove duplicate strings in a List in Java

Update: I guess HashSet.add(Object obj) does not call contains. is there a way to implement what I want(remove dup strings ignore case using Set)?

Original question: trying to remove dups from a list of String in java, however in the following code CaseInsensitiveSet.contains(Object ob) is not getting called, why?

public static List<String> removeDupList(List<String>list, boolean ignoreCase){
    Set<String> set = (ignoreCase?new CaseInsensitiveSet():new LinkedHashSet<String>());
    set.addAll(list);

    List<String> res = new Vector<String>(set);
    return res;
}


public class CaseInsensitiveSet  extends LinkedHashSet<String>{

    @Override
    public boolean contains(Object obj){
        //this not getting called.
        if(obj instanceof String){

            return super.contains(((String)obj).toLowerCase());
        }
        return super.contains(obj);
    }

}

Upvotes: 3

Answers (5)

Rahul

Reputation: 16335

add() method of LinkedHashSet do not call contains() internally else your method would have been called as well.

Instead of a LinkedHashSet, why dont you use a SortedSet with a case insensitive comparator ? With the String.CASE_INSENSITIVE_ORDER comparator

Your code is reduced to

public static List<String> removeDupList(List<String>list, boolean ignoreCase){
    Set<String> set = (ignoreCase?new TreeSet<String>(String.CASE_INSENSITIVE_ORDER):new LinkedHashSet<String>());
    set.addAll(list);

    List<String> res = new ArrayList<String>(set);
    return res;
}

If you wish to preserve the Order, as @tom anderson specified in his comment, you can use an auxiliary LinkedHashSet for the order.

You can try adding that element to TreeSet, if it returns true also add it to LinkedHashSet else not.

public static List<String> removeDupList(List<String>list){
        Set<String> sortedSet = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
        List<String> orderedList = new ArrayList<String>();
        for(String str : list){
             if(sortedSet.add(str)){ // add returns true, if it is not present already else false
                 orderedList.add(str);
             }
        }
        return orderedList;
    }

Upvotes: 3

Evgeniy Dorofeev

Reputation: 136002

Try

        Set set = new TreeSet(String.CASE_INSENSITIVE_ORDER);
        set.addAll(list);
        return new ArrayList(set);

UPDATE but as Tom Anderson mentioned it does not preserve the initial order, if this is really an issue try

    Set<String> set = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
    Iterator<String> i = list.iterator();
    while (i.hasNext()) {
        String s = i.next();
        if (set.contains(s)) {
            i.remove();
        }
        else {
            set.add(s);
        }
    }

prints

[2, 1]

Upvotes: 8

Tom Anderson

Reputation: 47183

Here's another approach, using a HashSet of the strings for deduplication, but building the result list directly:

public static List<String> removeDupList(List<String> list, boolean ignoreCase) {
    HashSet<String> seen = new HashSet<String>();
    ArrayList<String> deduplicatedList = new ArrayList<String>();
    for (String string : list) {
        if (seen.add(ignoreCase ? string.toLowerCase() : string)) {
            deduplicatedList.add(string);
        }
    }
    return deduplicatedList;
}

This is fairly simple, makes only one pass over the elements, and does only a lowercase, a hash lookup, and then a list append for each element.

Upvotes: 0

Yogesh Patil

Reputation: 888

Try

    public boolean addAll(Collection<? extends String> c) {
            for(String s : c) {
            if(! this.contains(s)) {
                this.add(s);
            }
        }
        return super.addAll(c);
    }
    @Override
    public boolean contains(Object o) {
        //Do your checking here
//      return super.contains(o);
    }

This will make sure the contains method is called if you want the code to go through there.

Upvotes: 0

Peter Lawrey

Reputation: 533500

contains is not called as LinkedHashSet is not implemented that way.

If you want add() to call contains() you will need to override it as well.

The reason it is not implemented this way is that calling contains first would mean you are performing two lookups instead of one which would be slower.

Upvotes: 5

remove duplicate strings in a List in Java

Answers (5)

Related Questions